Does Qwen 2.5 support Thai language?

#4
by Suppadate - opened

from transformers import AutoTokenizer
Here is my code
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2.5-0.5B-Instruct")
print(tokenizer.tokenize("สวัสดี"))

Output: ['สว', 'ั', 'สà¸Ķ', 'ี']

Can you suggest how to fix that or where I can find vocab.json, tokenizer.json, etc.?

Suppadate changed discussion title from Does Qwen 2.5 support That language? to Does Qwen 2.5 support Thai language?

Sign up or log in to comment