edupoux / MVA_2022_SL
☆7Updated 2 years ago
Alternatives and similar repositories for MVA_2022_SL:
Users that are interested in MVA_2022_SL are comparing it to the libraries listed below
- ☆86Updated this week
- Finetune VITS and MMS using HuggingFace's tools☆145Updated last year
- Library for pruning experts per language pair in NLLB-200☆33Updated last year
- 💬 Language Identification with Support for More Than 2000 Labels -- EMNLP 2023☆127Updated 4 months ago
- A library for preparing data for machine translation research (monolingual preprocessing, bitext mining, etc.) built by the FAIR NLLB te…☆272Updated 2 months ago
- Code for Zero-Shot Tokenizer Transfer☆127Updated 3 months ago
- Facebook AI Research Sequence-to-Sequence Toolkit written in Python.☆24Updated 2 weeks ago
- Repository contains code to fine-tune WhisperASR model☆23Updated 2 years ago
- This repository contains a demonstrative implementation for pooling-based models, e.g., DeepPyramidion complementing our paper "Sparsifyi…☆14Updated 2 years ago
- The pipeline for the OSCAR corpus☆168Updated last year
- Small repo describing how to use Hugging Face's Wav2Vec2 with PyCTCDecode☆111Updated 2 years ago
- NTREX -- News Test References for MT Evaluation☆81Updated 10 months ago
- ☆87Updated 4 months ago
- Glot500: Scaling Multilingual Corpora and Language Models to 500 Languages -- ACL 2023☆100Updated 11 months ago
- Pipeline for pulling and processing online language model pretraining data from the web☆177Updated last year
- Bicleaner fork that uses neural networks☆39Updated 8 months ago
- The FLORES+ Machine Translation Benchmark☆101Updated 5 months ago
- MAFAND-MT☆55Updated 9 months ago
- Official implementation of "GPT or BERT: why not both?"☆52Updated last month
- Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers☆90Updated 8 months ago
- Experiments with generating opensource language model assistants☆97Updated last year
- A repository containing the code for translating popular LLM benchmarks to German.☆25Updated last year
- Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.☆80Updated 7 months ago
- Universal Romanizer that can convert any unicode script to roman (latin) script☆191Updated 8 months ago
- Minimal sharded dataset loaders, decoders, and utils for multi-modal document, image, and text datasets.☆157Updated last year
- Efficient Language Model Training through Cross-Lingual and Progressive Transfer Learning☆30Updated 2 years ago
- Repository accompanying "An Open Dataset and Model for Language Identification" (Burchell et al., 2023)☆71Updated 2 weeks ago
- Various transformers for FSDP research☆37Updated 2 years ago
- ☆353Updated last year
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.☆93Updated 2 years ago