NathanGodey / headless-lm
Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https://arxiv.org/abs/2309.08351)
β25Updated 10 months ago
Alternatives and similar repositories for headless-lm:
Users that are interested in headless-lm are comparing it to the libraries listed below
- Tutorial to pretrain & fine-tune a π€ Flax T5 model on a TPUv3-8 with GCPβ58Updated 2 years ago
- QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning Pβ¦β34Updated last year
- LTG-Bertβ29Updated last year
- Embedding Recycling for Language modelsβ38Updated last year
- Using short models to classify long textsβ21Updated last year
- Code for SaGe subword tokenizer (EACL 2023)β24Updated 3 months ago
- My explorations into editing the knowledge and memories of an attention networkβ34Updated 2 years ago
- Library for fast text representation and classification.β28Updated last year
- Experiments for XLM-V Transformers Integerationβ13Updated 2 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β93Updated 2 years ago
- Index of URLs to pdf files all over the internet and scriptsβ21Updated last year
- Repository for fine-tuning Transformers π€ based seq2seq speech models in JAX/Flax.β35Updated 2 years ago
- Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learningβ33Updated last month
- Ranking of fine-tuned HF models as base models.β35Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ18Updated 3 weeks ago
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β46Updated last year
- Pre-train Static Word Embeddingsβ47Updated last month
- β14Updated 4 months ago
- HomebrewNLP in JAX flavour for maintable TPU-Trainingβ48Updated last year
- β46Updated 2 years ago
- Engineering the state of RNN language models (Mamba, RWKV, etc.)β32Updated 9 months ago
- A library for fast and efficient mLSTM Kernels.β8Updated 2 months ago
- A tiny BERT for low-resource monolingual modelsβ31Updated 5 months ago
- β44Updated 3 months ago
- Prabhupadavani: A Code-mixed Speech Translation Data for 25 languagesβ13Updated 2 years ago
- Applying "Load What You Need: Smaller Versions of Multilingual BERT" to LaBSEβ18Updated 3 years ago
- Execute arbitrary SQL queries on π€ Datasetsβ32Updated last year
- A library for data streaming and augmentationβ20Updated 11 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformersβ56Updated 9 months ago