--- language: ary metrics: - wer tags: - audio - automatic-speech-recognition - speech - xlsr-fine-tuning-week license: apache-2.0 model-index: - name: XLSR Wav2Vec2 Moroccan Arabic dialect by Boumehdi results: - task: name: Speech Recognition type: automatic-speech-recognition metrics: - name: Test WER type: wer value: 49.68 --- # Wav2Vec2-Large-XLSR-53-Moroccan-Darija [othrif/wav2vec2-large-xlsr-moroccan](https://huggingface.co/othrif/wav2vec2-large-xlsr-moroccan) fine-tuned on 6 hours of labeled Darija Audios I have also added 3 phonetic units to this model ڭ, ڤ and پ. For example: ڭال , ڤيديو , پودكاست ## Usage The model can be used directly (without a language model) as follows: ```python import librosa import torch from transformers import Wav2Vec2CTCTokenizer, Wav2Vec2ForCTC, Wav2Vec2Processor, TrainingArguments, Wav2Vec2FeatureExtractor, Trainer tokenizer = Wav2Vec2CTCTokenizer("./vocab.json", unk_token="[UNK]", pad_token="[PAD]", word_delimiter_token="|") processor = Wav2Vec2Processor.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija', tokenizer=tokenizer) model=Wav2Vec2ForCTC.from_pretrained('boumehdi/wav2vec2-large-xlsr-moroccan-darija') # load the audio data (use your own wav file here!) input_audio, sr = librosa.load('file.wav', sr=16000) # tokenize input_values = processor(input_audio, return_tensors="pt", padding=True).input_values # retrieve logits logits = model(input_values).logits tokens=torch.argmax(logits, axis=-1) # decode using n-gram transcription = tokenizer.batch_decode(tokens) # print the output print(transcription) ``` Here's the output: ڭالت ليا هاد السيد هادا ما كاينش بحالو ## Evaluation **Wer**: 49.68 **Training Loss**: 9.88 **Validation Loss**: 45.24 This high validation loss value is mainly due to the fact that Darija can be written in many ways. ## Future Work Currently working on improving this model. The new model will be available soon.