ad-freiburg / whitespace-correction
Fast whitespace correction with Transformers
☆14Updated 6 months ago
Related projects ⓘ
Alternatives and complementary repositories for whitespace-correction
- A High-Quality and Large-Scale Dataset for English-Vietnamese Speech Translation (INTERSPEECH 2022)☆20Updated 3 months ago
- Transformation spoken text to written text☆28Updated 5 months ago
- Vietnamese Punctuation Prediction using Pretrained Language Models☆13Updated 2 years ago
- Whisper finetuned on VinBigdata-VLSP2020-100h + KenLM☆33Updated last year
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.☆27Updated 8 months ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆12Updated last year
- Vi_G2P or ViG2P: G2P package for Vietnamese: based on vPhon and phonology knowledge to convert Raw text - Graphoneme to IPA☆67Updated 4 months ago
- End-to-End Vietnamese Speech Recognition using wav2vec 2.0☆93Updated 3 years ago
- Parallelized automatic corpus collection for ASR. Forked from https://github.com/EgorLakomkin/KTSpeechCrawler☆23Updated 3 years ago
- Finetune multiple pre-trained Transformer-based models to solve Vietnamese Fake News Detection problem (ReINTEL) in VLSP2020 shared task☆18Updated 3 years ago
- Repository for Findings of EMNLP 2020 "Context-aware Stand-alone Neural Spelling Correction"☆18Updated 3 years ago
- ☆40Updated last year
- Correction of spaces with character-based neural language models.☆13Updated 2 years ago
- ☆9Updated last year
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆74Updated last month
- Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP☆58Updated 2 years ago
- BLOOM+1: Adapting BLOOM model to support a new unseen language☆70Updated 8 months ago
- Repository containing the open source code of works published at the FBK MT unit.☆42Updated 4 months ago
- Final training script from HuggingFace Whisper Fine tuning event - to get best results on finetuned model.☆12Updated last year
- ☆27Updated 3 years ago
- Showcasing various NLP Downstream tasks Training with pre-trained Language models using Pytorch Lightning☆12Updated 2 years ago
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to E…☆19Updated 2 years ago
- ☆33Updated 3 years ago
- This repository contains data used in the NAACL 2021 Paper - Proteno: Text Normalization with Limited Data for Fast Deployment in Text to…☆42Updated 3 years ago
- Vocabulary Trimming (VT) is a model compression technique, which reduces a multilingual LM vocabulary to a target language by deleting ir…☆30Updated 2 weeks ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages☆13Updated 2 years ago
- Reduce the size of pretrained Hugging Face models via vocabulary trimming.☆43Updated last year
- one script for xls-r/xlsr/whisper fine-tuning☆39Updated last year
- PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)☆39Updated 3 months ago
- asr2k☆48Updated 5 months ago