mbzuai-nlp / ArTST
☆20Updated 3 months ago
Related projects: ⓘ
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated 7 months ago
- Arabic deep-learning based diacritization models (Shakkala, Shakkelha) ported to PyTorch☆12Updated last year
- Using YouTube to prepare a speech recognition dataset for any language☆10Updated 3 years ago
- Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downs…☆31Updated 3 years ago
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR models☆31Updated 3 years ago
- A transcribed speech dataset in Wolof, Pulaar and Sereer, to support agriculture. Funded by Lacuna Fund.☆10Updated 4 months ago
- ☆40Updated last year
- An implementation of the paper titled "Arabic Speech Emotion Recognition Employing Wav2vec2.0 and HuBERT Based on BAVED Dataset" https://…☆10Updated 2 years ago
- Scripts to create speech corpora from open.bible☆11Updated 2 years ago
- A tiny BERT for low-resource monolingual models☆28Updated 4 months ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- ☆38Updated last year
- Sequence to sequence model for Arabic punctuation prediction.☆12Updated 4 years ago
- scipts for working with open.bible data☆23Updated 2 years ago
- docker for HF wav2vec2-sprint☆12Updated 3 years ago
- Word Error Rate Estimation☆10Updated 4 years ago
- [Computer Speech & Language] A transformer-based spelling error correction framework for Bangla and resource scarce Indic languages☆9Updated last month
- Official Repository of the Deep Diacritization Paper☆16Updated 3 years ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated last year
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆24Updated 3 years ago
- AsoSoft Speech Corpus for Central-Kurdish Text-To-Speech☆12Updated 2 years ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated last year
- ☆11Updated 2 years ago
- 🎯 Speech Recognition Challenge by Speech Lab - IIT Madras☆11Updated 3 years ago
- phone inventory library☆14Updated last year
- Collection of scripts from mHuBERT-147.☆21Updated 2 months ago
- ☆16Updated 3 years ago
- Repository containing the open source code of works published at the FBK MT unit.☆41Updated 2 months ago
- ☆15Updated 5 years ago
- This is a diacritization model for Arabic language. This model was built/trained using the Tashkeela: the Arabic diacritization corpus on…☆34Updated last year