elichen3051's picture
End of training
22302eb verified
metadata
license: llama2
base_model: elichen3051/llama2-7b-sft-chat-no-template
tags:
  - alignment-handbook
  - trl
  - dpo
  - generated_from_trainer
  - trl
  - dpo
  - generated_from_trainer
datasets:
  - HuggingFaceH4/ultrafeedback_binarized
  - HuggingFaceH4/orca_dpo_pairs
  - HuggingFaceH4/cai-conversation-harmless
model-index:
  - name: Llama2-7b-sft-chat-custom-template-dpo
    results: []

Visualize in Weights & Biases

Llama2-7b-sft-chat-custom-template-dpo

This model is a fine-tuned version of elichen3051/llama2-7b-sft-chat-no-template on the HuggingFaceH4/ultrafeedback_binarized, the HuggingFaceH4/orca_dpo_pairs and the HuggingFaceH4/cai-conversation-harmless datasets. It achieves the following results on the evaluation set:

  • Loss: 0.4717
  • Rewards/chosen: -1.6807
  • Rewards/rejected: -3.1957
  • Rewards/accuracies: 0.6345
  • Rewards/margins: 1.5150
  • Logps/rejected: -519.5196
  • Logps/chosen: -379.2986
  • Logits/rejected: -2.7275
  • Logits/chosen: -2.7213

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-07
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 7
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 448
  • total_eval_batch_size: 56
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.03
  • num_epochs: 2

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.6727 0.2032 43 0.6714 -0.0530 -0.0999 0.5871 0.0470 -209.9431 -216.5270 -2.2167 -2.2006
0.6056 0.4064 86 0.6041 -0.5876 -0.8878 0.6023 0.3002 -288.7347 -269.9940 -3.0277 -3.0177
0.573 0.6096 129 0.5451 -0.9286 -1.6015 0.6174 0.6729 -360.0960 -304.0913 -2.9301 -2.9238
0.5239 0.8128 172 0.5123 -1.2863 -2.2358 0.6288 0.9495 -423.5324 -339.8588 -2.9884 -2.9803
0.4668 1.0159 215 0.4945 -1.4994 -2.6377 0.6439 1.1383 -463.7195 -361.1752 -2.5910 -2.5843
0.4607 1.2191 258 0.4816 -1.5810 -2.8887 0.6402 1.3077 -488.8177 -369.3280 -2.8026 -2.7951
0.5068 1.4223 301 0.4764 -1.5805 -3.0061 0.6402 1.4256 -500.5590 -369.2790 -2.7586 -2.7513
0.4724 1.6255 344 0.4730 -1.6832 -3.1741 0.6383 1.4909 -517.3631 -379.5493 -2.6296 -2.6237
0.4836 1.8287 387 0.4718 -1.6795 -3.1900 0.6420 1.5105 -518.9514 -379.1832 -2.6434 -2.6374

Framework versions

  • Transformers 4.42.0.dev0
  • Pytorch 2.3.1
  • Datasets 2.19.2
  • Tokenizers 0.19.1