NathanGodey / headless-lmLinks

Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https://arxiv.org/abs/2309.08351)

☆27

Alternatives and similar repositories for headless-lm

Users that are interested in headless-lm are comparing it to the libraries listed below

Sorting:

google-research-datasets / QAmeleon
QAmeleon introduces synthetic multilingual QA data using PaLM, a 540B large language model. This dataset was generated by prompt tuning P…
☆34Updated last year
gsarti / t5-flax-gcp
Tutorial to pretrain & fine-tune a 🤗 Flax T5 model on a TPUv3-8 with GCP
☆58Updated 3 years ago
allenai / EmbeddingRecycling
Embedding Recycling for Language models
☆39Updated 2 years ago
MeLeLBGU / SaGe
Code for SaGe subword tokenizer (EACL 2023)
☆25Updated 8 months ago
HomebrewML / Olmax
HomebrewNLP in JAX flavour for maintable TPU-Training
☆50Updated last year
huggingface / olm-training
Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.
☆93Updated 2 years ago
Knowledgator / FlashDeBERTa
Trully flash implementation of DeBERTa disentangled attention mechanism.
☆62Updated 2 months ago
UKPLab / on-emergence
Codes and files for the paper Are Emergent Abilities in Large Language Models just In-Context Learning
☆33Updated 6 months ago
ltgoslo / ltg-bert
LTG-Bert
☆33Updated last year
orevaahia / magnet-tokenization
☆13Updated 8 months ago
lucidrains / memory-editable-transformer
My explorations into editing the knowledge and memories of an attention network
☆35Updated 2 years ago
kpu / fasterText
Library for fast text representation and classification.
☆30Updated last year
nbroad1881 / strideformer
Using short models to classify long texts
☆21Updated 2 years ago
catie-aq / flashT5
A fast implementation of T5/UL2 in PyTorch using Flash Attention
☆107Updated 4 months ago
cimeister / typical-sampling
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
☆82Updated 3 years ago
ZurichNLP / mbr
Minimum Bayes Risk Decoding for Hugging Face Transformers
☆58Updated last year
EleutherAI / rnngineering
Engineering the state of RNN language models (Mamba, RWKV, etc.)
☆32Updated last year
castorini / hf-spacerini
Plug-and-play Search Interfaces with Pyserini and Hugging Face
☆32Updated 2 years ago
jungokasai / beam_with_patience
☆46Updated 3 years ago
bminixhofer / tokenkit
A toolkit implementing advanced methods to transfer models and model knowledge across tokenizers.
☆40Updated last month
BlinkDL / SmallInitEmb
LayerNorm(SmallInit(Embedding)) in a Transformer to improve convergence
☆59Updated 3 years ago
microsoft / encoder-decoder-slm
Efficient encoder-decoder architecture for small language models (≤1B parameters) with cross-architecture knowledge distillation and visi…
☆29Updated 5 months ago
Rallio67 / language-model-agents
Experiments with generating opensource language model assistants
☆97Updated 2 years ago
lucy3 / whos_filtered
☆14Updated 10 months ago
warner-benjamin / optimi
Fast, Modern, and Low Precision PyTorch Optimizers
☆103Updated last week
microsoft / mutransformers
some common Huggingface transformers in maximal update parametrization (µP)
☆82Updated 3 years ago
chandar-lab / NeoBERT
☆79Updated 2 months ago
nreimers / se-pytorch-xla
☆21Updated 3 years ago
kyleliang919 / Long-context-transformers
Exploring finetuning public checkpoints on filter 8K sequences on Pile
☆116Updated 2 years ago
Knowledgator / TurboT5
Truly flash T5 realization!
☆68Updated last year