The tokenizer adds a special token '<|im_end|>' to solve the problem of non-stop generation when encountering <|im_end|>.

#12

Using vllm to infer 'Llama3-ChatQA-1.5-70B', it will continue to be generated when encountering the special token '<|im_end|>', as shown in the figure below. This PR adds <|im_end|> to the tokenizer, and you need to add mapping to generation_config.json.
8e4f01f676a0de25c1412b10172cfa9.png

Ready to merge
This branch is ready to get merged automatically.

Sign up or log in to comment