Fine tuned over a 10K stratified sample of instruct-question-answer triplets gathered from the following sources:

Medical meadow flashcards
Medical meadow wikidocs
HealthcareMagic dataset
Medical meadow MedQA MCQs
Medinstruct dataset
Medquad dataset
iCliniq dataset
Medical meadow patient info dataset
GenMedGPT dataset

Fine-tuned using LoRA for 1 epoch, with rank=64, alpha=16

Usage:

Load using vLLM as follows:

from vllm import LLM, SamplingParams

llm = LLM(model="jiviadmin/biomistral-ft-10k")

sampling_params = SamplingParams(max_tokens=1, # set it same as max_seq_length in SFT Trainer,
temperature=0.1,
skip_special_tokens=True,
repetition_penalty=1.5)

input_data = <YOUR-INPUT-PROMPTS-AS-A-LIST>
prompts = []
outputs_ls = []

TEMPLATE = """{}""" # The prompt is same as training one, just without output part, you can add special tokens like [INST] if needed

def add_prompt(sample):
    prompt = TEMPLATE.format(sample)
    return prompt

for sample in input_data:
    text = add_prompt(sample)
    prompts.append(text)

outputs = llm.generate(prompts, sampling_params) # Batch inference

for output in outputs:
    generated_text = output.outputs[0].text
    outputs_ls.append(generated_text.strip())

Benchmarks:

Model	Prompt Type	Temp	Repetition Penalty	Overall Accuracy	Pubmed	MedQA	MedMCQA	Pubmed questions count	MedQA questions count	MedMCQA questions count
Biomistral - FT 10K	No RAG	0.1	1.5	44.19%	51.36%	38.29%	43.58%	847	935	888
Biomistral - FT 10K	RAG : Highest scoring chunk selected	0.1	1.5	82.08%	95.89%	67.68%	82.48%	998	984	959
Biomistral - FT 10K	RAG : Reranker (BGE V2 M3) used to select chunk	0.1	1.5	86.44%	97.50%	73.28%	88.47%	999	988	971

jiviai
/

biomistral-ft-10k

Usage:

Benchmarks:

Datasets used to train jiviai/biomistral-ft-10k

Spaces using jiviai/biomistral-ft-10k 2