scarlett623's picture
update model card README.md
959a513
metadata
license: apache-2.0
base_model: facebook/wav2vec2-large-xlsr-53
tags:
  - generated_from_trainer
datasets:
  - common_voice
metrics:
  - wer
model-index:
  - name: wav2vec2-large-xlsr53-zh-cn-subset20-colab
    results:
      - task:
          name: Automatic Speech Recognition
          type: automatic-speech-recognition
        dataset:
          name: common_voice
          type: common_voice
          config: zh-CN
          split: test[:20%]
          args: zh-CN
        metrics:
          - name: Wer
            type: wer
            value: 0.9503424657534246

wav2vec2-large-xlsr53-zh-cn-subset20-colab

This model is a fine-tuned version of facebook/wav2vec2-large-xlsr-53 on the common_voice dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0566
  • Wer: 0.9503
  • Cer: 0.3333

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 13
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 26
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Wer Cer
No log 1.9 400 6.7551 1.0 1.0
34.7845 3.81 800 6.4563 1.0 1.0
6.4358 5.71 1200 4.2319 1.0074 0.7454
4.2052 7.62 1600 2.6538 1.0200 0.5562
2.3906 9.52 2000 2.3565 1.0063 0.5147
2.3906 11.43 2400 2.1287 0.9863 0.4822
1.93 13.33 2800 1.9585 0.9812 0.4528
1.6322 15.24 3200 1.8771 0.9937 0.4381
1.3629 17.14 3600 1.8405 0.9926 0.4242
1.166 19.05 4000 1.7674 0.9989 0.4140
1.166 20.95 4400 1.7879 0.9795 0.4047
0.9915 22.86 4800 1.7597 1.0126 0.4080
0.8517 24.76 5200 1.7726 0.9829 0.3966
0.7143 26.67 5600 1.7623 0.9732 0.3863
0.6267 28.57 6000 1.8164 0.9720 0.3863
0.6267 30.48 6400 1.8136 0.9680 0.3801
0.5389 32.38 6800 1.8696 0.9652 0.3812
0.4764 34.29 7200 1.8625 0.9663 0.3744
0.4095 36.19 7600 1.8868 0.9618 0.3683
0.3594 38.1 8000 1.8834 0.9623 0.3699
0.3594 40.0 8400 1.9155 0.9589 0.3670
0.3064 41.9 8800 1.9268 0.9652 0.3688
0.2825 43.81 9200 1.9527 0.9697 0.3674
0.2524 45.71 9600 1.9726 0.9686 0.3617
0.2272 47.62 10000 1.9594 0.9629 0.3619
0.2272 49.52 10400 1.9799 0.9635 0.3607
0.2042 51.43 10800 2.0175 0.9669 0.3582
0.1975 53.33 11200 2.0246 0.9589 0.3571
0.1827 55.24 11600 2.0535 0.9703 0.3600
0.1677 57.14 12000 2.0458 0.9583 0.3555
0.1677 59.05 12400 2.0893 0.9572 0.3583
0.1626 60.95 12800 2.0729 0.9600 0.3557
0.155 62.86 13200 2.0706 0.9572 0.3538
0.1456 64.76 13600 2.0761 0.9532 0.3553
0.1337 66.67 14000 2.0349 0.9589 0.3474
0.1337 68.57 14400 2.0844 0.9549 0.3484
0.1274 70.48 14800 2.0874 0.9578 0.3505
0.1198 72.38 15200 2.0813 0.9526 0.3473
0.1164 74.29 15600 2.0866 0.9498 0.3473
0.1105 76.19 16000 2.0688 0.9486 0.3421
0.1105 78.1 16400 2.0854 0.9498 0.3431
0.1053 80.0 16800 2.0749 0.9503 0.3414
0.1 81.9 17200 2.0622 0.9543 0.3407
0.0977 83.81 17600 2.0678 0.9532 0.3396
0.0906 85.71 18000 2.0650 0.9515 0.3383
0.0906 87.62 18400 2.0631 0.9492 0.3378
0.0867 89.52 18800 2.0633 0.9521 0.3365
0.0836 91.43 19200 2.0606 0.9532 0.3346
0.0819 93.33 19600 2.0671 0.9538 0.3355
0.0768 95.24 20000 2.0661 0.9509 0.3338
0.0768 97.14 20400 2.0564 0.9498 0.3335
0.0752 99.05 20800 2.0566 0.9503 0.3333

Framework versions

  • Transformers 4.32.0.dev0
  • Pytorch 2.0.1+cu118
  • Datasets 2.14.3
  • Tokenizers 0.13.3