Training and evaluation code for the paper "Headless Language Models: Learning without Predicting with Contrastive Weight Tying" (https://arxiv.org/abs/2309.08351)
β28Apr 17, 2024Updated last year
Alternatives and similar repositories for headless-lm
Users that are interested in headless-lm are comparing it to the libraries listed below
Sorting:
- ππ€ A collection of templates for Hugging Face Spacesβ35Oct 9, 2023Updated 2 years ago
- β10Oct 2, 2024Updated last year
- β10Oct 15, 2019Updated 6 years ago
- This repository contains the sample code to benchmark popular time series forecast algorithms using Gluonts in AWS Sagemaker Notebook Insβ¦β13Jul 26, 2021Updated 4 years ago
- π Multilingual Evaluation of English-Centric LLMs via Cross-Lingual Alignmentβ11Apr 6, 2025Updated 10 months ago
- T-Projection is a method to perform high-quality Annotation Projection of Sequence Labeling datasets.β13Nov 21, 2023Updated 2 years ago
- Resources related to EMNLP 2021 paper "FAME: Feature-Based Adversarial Meta-Embeddings for Robust Input Representations"β13Dec 14, 2021Updated 4 years ago
- Evaluate your models with A/B test experimentsβ14Jan 5, 2023Updated 3 years ago
- PyTorch implementation of the Flash Spectral Transform Unit.β21Sep 19, 2024Updated last year
- Minimal code to train ELMo models in recent versions of TensorFlowβ14Apr 30, 2023Updated 2 years ago
- DPO, but faster πβ48Dec 6, 2024Updated last year
- Goldfish: Monolingual language models for 350 languages.β23Aug 25, 2024Updated last year
- Set-Equivariant Deep Learning Modelsβ22Dec 23, 2021Updated 4 years ago
- A extension of Transformers library to include T5ForSequenceClassification class.β40Apr 17, 2023Updated 2 years ago
- Code for AAAI 2023 Paper : βAlignment-Enriched Tuning for Patch-Level Pre-trained Document Image Modelsββ18Dec 6, 2022Updated 3 years ago
- Use sync mode Playwright interactively, inside a Jupyter notebookβ19Jan 29, 2026Updated last month
- Code for the paper "Getting the most out of your tokenizer for pre-training and domain adaptation"β22Feb 14, 2024Updated 2 years ago
- βοΈ Sentence segmentation with wtpsplit's state-of-the-art Segment any Text (SaT) modelsβ36Oct 1, 2025Updated 5 months ago
- Temporary remove unused tokens during training to save ram and speed.β23Jun 15, 2025Updated 8 months ago
- Overview of corpora/datasets for Germanic low-resource languages and dialects. Accompanies "A Survey of Corpora for Germanic Low-Resourceβ¦β26Feb 16, 2026Updated 2 weeks ago
- A software for transferring pre-trained English models to foreign languagesβ19Mar 20, 2023Updated 2 years ago
- β75Jul 2, 2021Updated 4 years ago
- Scripts to convert datasets from various sources to Hugging Face Datasets.β57Oct 26, 2022Updated 3 years ago
- Implementation of the paper "Fine-Tuning Transformers: Vocabulary Transfer" https://arxiv.org/pdf/2112.14569.pdfβ20Dec 28, 2021Updated 4 years ago
- Repo for training MLMs, CLMs, or T5-type models on the OLM pretraining data, but it should work with any hugging face text dataset.β96Feb 9, 2023Updated 3 years ago
- Exploring Few-Shot Adaptation of Language Models with Tablesβ24Aug 22, 2022Updated 3 years ago
- High-performance, asynchronous Python HTTP client library designed for faster file transfers using concurrency, semaphores, and fault-tolβ¦β59May 12, 2025Updated 9 months ago
- Testing and training detection models for emoji-based hate speech.β24May 15, 2022Updated 3 years ago
- C++ inference wrappers for running blazing fast embedding services on your favourite serverless like AWS Lambda. By Prithivi Da, PRs welcβ¦β23Mar 4, 2024Updated 2 years ago
- A collection of notebooks for Natural Language Processingβ25Jan 13, 2025Updated last year
- A model(ing framework) for sample efficient OCRβ64Apr 7, 2023Updated 2 years ago
- A tool for benchmarking LLMs on Modalβ48Aug 29, 2025Updated 6 months ago
- This project shows how to derive the total number of training tokens from a large text dataset from π€ datasets with Apache Beam and Dataβ¦β27Oct 20, 2022Updated 3 years ago
- Research code for the paper "How Good is Your Tokenizer? On the Monolingual Performance of Multilingual Language Models"β28Oct 3, 2021Updated 4 years ago
- Trully flash implementation of DeBERTa disentangled attention mechanism.β78Feb 10, 2026Updated 3 weeks ago
- LTG-Bertβ34Jan 8, 2024Updated 2 years ago
- Repo for: When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgmentβ38Jun 5, 2023Updated 2 years ago
- SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 diβ¦β38Sep 25, 2023Updated 2 years ago
- This repository shows various ways of deploying a vision model (TensorFlow) from π€ Transformers.β30Aug 22, 2022Updated 3 years ago