Fine tuned over a 10K stratified sample of instruct-question-answer triplets gathered from the following sources:
- Medical meadow flashcards
- Medical meadow wikidocs
- HealthcareMagic dataset
- Medical meadow MedQA MCQs
- Medinstruct dataset
- Medquad dataset
- iCliniq dataset
- Medical meadow patient info dataset
- GenMedGPT dataset
Fine-tuned using LoRA for 1 epoch, with rank=64, alpha=16
Usage:
Load using vLLM as follows:
from vllm import LLM, SamplingParams
llm = LLM(model="jiviadmin/biomistral-ft-10k")
sampling_params = SamplingParams(max_tokens=1, # set it same as max_seq_length in SFT Trainer,
temperature=0.1,
skip_special_tokens=True,
repetition_penalty=1.5)
input_data = <YOUR-INPUT-PROMPTS-AS-A-LIST>
prompts = []
outputs_ls = []
TEMPLATE = """{}""" # The prompt is same as training one, just without output part, you can add special tokens like [INST] if needed
def add_prompt(sample):
prompt = TEMPLATE.format(sample)
return prompt
for sample in input_data:
text = add_prompt(sample)
prompts.append(text)
outputs = llm.generate(prompts, sampling_params) # Batch inference
for output in outputs:
generated_text = output.outputs[0].text
outputs_ls.append(generated_text.strip())
Benchmarks:
Model | Prompt Type | Temp | Repetition Penalty | Overall Accuracy | Pubmed | MedQA | MedMCQA | Pubmed questions count | MedQA questions count | MedMCQA questions count |
---|---|---|---|---|---|---|---|---|---|---|
Biomistral - FT 10K | No RAG | 0.1 | 1.5 | 44.19% | 51.36% | 38.29% | 43.58% | 847 | 935 | 888 |
Biomistral - FT 10K | RAG : Highest scoring chunk selected | 0.1 | 1.5 | 82.08% | 95.89% | 67.68% | 82.48% | 998 | 984 | 959 |
Biomistral - FT 10K | RAG : Reranker (BGE V2 M3) used to select chunk | 0.1 | 1.5 | 86.44% | 97.50% | 73.28% | 88.47% | 999 | 988 | 971 |
- Downloads last month
- 695
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.