Softcatala / nmt-models
Softcatalà neural translation models
☆18Updated 2 months ago
Alternatives and similar repositories for nmt-models:
Users that are interested in nmt-models are comparing it to the libraries listed below
- Model for recasing and repunctuating ASR transcripts☆133Updated 11 months ago
- Tool to fix bitexts and tag near-duplicates for removal☆30Updated last month
- Linguistic processing for Common Voice☆55Updated last year
- NTREX -- News Test References for MT Evaluation☆81Updated 9 months ago
- scipts for working with open.bible data☆24Updated 3 years ago
- Whisper fine-tuning event script to use multiple hf datasets☆32Updated 2 years ago
- Seed Machine Translation Data☆31Updated 4 months ago
- Scripts to create speech corpora from open.bible☆13Updated 3 years ago
- Finite-state script normalization and processing utilities☆39Updated last month
- Targetted language identifier, based on FastText and Hunspell.☆34Updated last month
- Parse and convert numbers written in French, English, Spanish, Portuguese, German and Catalan into their digit representation.☆105Updated last month
- Bicleaner fork that uses neural networks☆39Updated 8 months ago
- AfroLID, a powerful neural toolkit for African languages identification which covers 517 African languages.☆31Updated 3 weeks ago
- This is a neural spell checker☆65Updated 2 years ago
- Caucasus languages focused multilingual and monolingual corpuses for Natural Language Processing(NLP)☆35Updated 4 months ago
- Bicleaner is a parallel corpus classifier/cleaner that aims at detecting noisy sentence pairs in a parallel corpus.☆156Updated 9 months ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆70Updated 11 months ago
- OpusCleaner is a web interface that helps you select, clean and schedule your data for training machine translation models.☆51Updated 2 months ago
- A bidirectional recurrent neural network model with attention mechanism for restoring missing punctuation in unsegmented text☆36Updated 4 years ago
- 📝An easy-to-use package to restore punctuation of the text.☆114Updated last year
- Library for pruning experts per language pair in NLLB-200☆32Updated last year
- Finetune VITS and MMS using HuggingFace's tools☆139Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆21Updated last month
- Spoken Language Identification on Common Voice and AudioSet using Deep Learning☆37Updated 2 years ago
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 2 years ago
- Audio Preprocessing and finetuning of wav2vec2-large-xlsr model on AI4D Baamtu Datamation - Automatic Speech Recognition in WOLOF Data.☆17Updated 3 years ago
- These are lists for a variety of languages containing words that are distinctive to each language.☆37Updated 2 years ago
- Machine Translation (MT) Preparation Scripts☆31Updated last month
- Universal Romanizer that can convert any unicode script to roman (latin) script☆189Updated 8 months ago
- ☆15Updated 5 years ago