sanchit-gandhi / codesnippets
β10Updated last year
Alternatives and similar repositories for codesnippets:
Users that are interested in codesnippets are comparing it to the libraries listed below
- Repository for fine-tuning Transformers π€ based seq2seq speech models in JAX/Flax.β35Updated 2 years ago
- Experiments for XLM-V Transformers Integerationβ13Updated 2 years ago
- Experiments with generating opensource language model assistantsβ97Updated last year
- Consists of the largest (10K) human annotated code-switched semantic parsing dataset & 170K generated utterance using the CST5 augmentatiβ¦β39Updated 2 years ago
- Using short models to classify long textsβ21Updated 2 years ago
- Crosslingual Question Answering for African Languagesβ30Updated 7 months ago
- π« check your data, before you wreck your modelβ16Updated 2 years ago
- β32Updated 2 years ago
- β20Updated 2 years ago
- MAFAND-MTβ55Updated 9 months ago
- β43Updated 2 months ago
- Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https:/β¦β26Updated last year
- A PyTorch Lightning Callback for pushing models to the Hugging Face Hub π€β‘οΈβ36Updated 2 years ago
- Implementation of "SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages" paper, accepted to Eβ¦β26Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β34Updated 4 months ago
- QLoRA with Enhanced Multi GPU Supportβ37Updated last year
- This app is intended to automatically create a corpus for ASR systems using pseudo-labeling.β27Updated last year
- Tutorial to pretrain & fine-tune a π€ Flax T5 model on a TPUv3-8 with GCPβ58Updated 2 years ago
- β11Updated 2 months ago
- β24Updated last year
- A collection of scripts to preprocess ASR datasets and finetune language-specific Wav2Vec2 XLSR modelsβ31Updated 4 years ago
- This will hold the data pipeline to convert raw audio data to speech which will act as input dataset for speech-to-text pipelineβ32Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β93Updated 2 years ago
- Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSEβ18Updated 3 years ago
- β17Updated 2 years ago
- β51Updated last year
- A list of scripts/notebooks I'd like to keep handyβ17Updated 8 months ago
- Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translationβ143Updated last year
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languagesβ13Updated 2 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.β46Updated 3 weeks ago