Edit model card
  • indo-instruct-llama2-32kmodel card
  • Model Details
  • Developed by: monuminu
  • Backbone Model: LLaMA-2
  • Language(s): English
  • Library: HuggingFace Transformers
  • License: Fine-tuned checkpoints is licensed under the Non-Commercial Creative Commons license (CC BY-NC-4.0)
  • Where to send comments: Instructions on how to provide feedback or comments on a model can be found by opening an issue in the Hugging Face community's model repository
  • Contact: For questions and comments about the model
  • Dataset Details
  • Used Datasets
  • alpaca dataset
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

tokenizer = AutoTokenizer.from_pretrained("monuminu/indo-instruct-llama2-32k")
model = AutoModelForCausalLM.from_pretrained(
    "monuminu/indo-instruct-llama2-32k",
    device_map="auto",
    torch_dtype=torch.float16,
    load_in_8bit=True,
    rope_scaling={"type": "dynamic", "factor": 2} # allows handling of longer inputs
)

prompt = "### User:\nThomas is healthy, but he has to go to the hospital. What could be the reasons?\n\n### Assistant:\n"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
del inputs["token_type_ids"]
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

output = model.generate(**inputs, streamer=streamer, use_cache=True, max_new_tokens=float('inf'))
output_text = tokenizer.decode(output[0], skip_special_tokens=True)
Downloads last month
9
Safetensors
Model size
6.74B params
Tensor type
F32
·
FP16
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.