☆108Mar 12, 2024Updated last year
Alternatives and similar repositories for Next-Token-Failures
Users that are interested in Next-Token-Failures are comparing it to the libraries listed below
Sorting:
- ☆19Sep 16, 2025Updated 5 months ago
- ☆91Aug 18, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- Official repo of dataset-decomposition paper [NeurIPS 2024]☆21Jan 8, 2025Updated last year
- ☆64Apr 9, 2024Updated last year
- Code for NeurIPS 2024 Spotlight: "Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations"☆91Oct 30, 2024Updated last year
- HGRN2: Gated Linear RNNs with State Expansion☆56Aug 20, 2024Updated last year
- ☆20Nov 3, 2024Updated last year
- Code for the paper "Function-Space Learning Rates"☆25Jun 3, 2025Updated 8 months ago
- Adaptation of titans-pytorch to llama models on HF☆26Mar 6, 2025Updated 11 months ago
- ☆12Feb 16, 2024Updated 2 years ago
- ☆14Apr 29, 2025Updated 10 months ago
- Towards Understanding Sharpness-Aware Minimization [ICML 2022]☆38Jun 14, 2022Updated 3 years ago
- Code and Data Repo for the CoNLL Paper -- Future Lens: Anticipating Subsequent Tokens from a Single Hidden State☆20Oct 24, 2025Updated 4 months ago
- ☆25Aug 23, 2024Updated last year
- ☆52Jun 10, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 11 months ago
- HyPe: Better Pre-trained Language Model Fine-tuning with Hidden Representation Perturbation [ACL 2023]☆14Jul 11, 2023Updated 2 years ago
- Code for paper: "Executing Arithmetic: Fine-Tuning Large Language Models as Turing Machines"☆11Oct 11, 2024Updated last year
- Label shift estimation for transfer difficulty with Familiarity.☆10Feb 4, 2025Updated last year
- Code for "What really matters in matrix-whitening optimizers?"☆21Oct 31, 2025Updated 4 months ago
- ☆13Nov 5, 2024Updated last year
- Code for 'Contrastive Multi-Document Question Generation'☆11Oct 16, 2022Updated 3 years ago
- ☆10Dec 17, 2020Updated 5 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Oct 13, 2024Updated last year
- [ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…☆103Jun 14, 2024Updated last year
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Nov 14, 2023Updated 2 years ago
- Experimental playground for benchmarking language model (LM) architectures, layers, and tricks on smaller datasets. Designed for flexible…☆102Jan 26, 2026Updated last month
- Pytorch implementation of paper "Efficient Nearest Neighbor Language Models" (EMNLP 2021)☆75Jan 20, 2022Updated 4 years ago
- Test-time-training on nearest neighbors for large language models☆49Apr 18, 2024Updated last year
- Code for EMNLP 2021 paper "Measuring Association Between Labels and Free-Text Rationales"☆12Sep 12, 2023Updated 2 years ago
- ☆13Jan 20, 2023Updated 3 years ago
- Official Implementation of "Transferring Inductive Biases Through Knowledge Distillation"☆15Jun 3, 2020Updated 5 years ago
- ☆12Jan 4, 2024Updated 2 years ago
- ☆19Jul 31, 2025Updated 7 months ago
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆63Oct 3, 2025Updated 4 months ago
- ☆51Mar 2, 2024Updated last year
- Code and Data for "Long-context LLMs Struggle with Long In-context Learning" [TMLR2025]☆112Feb 20, 2025Updated last year
- ☆46Oct 11, 2023Updated 2 years ago