[NeurIPS '25] Multi-Token Prediction Needs Registers
☆27Dec 14, 2025Updated 2 months ago
Alternatives and similar repositories for MuToR
Users that are interested in MuToR are comparing it to the libraries listed below
Sorting:
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- ☆15Sep 24, 2023Updated 2 years ago
- The code for "AttentionPredictor: Temporal Pattern Matters for Efficient LLM Inference", Qingyue Yang, Jie Wang, Xing Li, Zhihai Wang, Ch…☆28Jul 15, 2025Updated 7 months ago
- [ICLR 2025] Official implementation of paper "Dynamic Low-Rank Sparse Adaptation for Large Language Models".☆23Mar 16, 2025Updated 11 months ago
- Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation, ICML 2024☆22Jun 26, 2024Updated last year
- ☆25Oct 31, 2024Updated last year
- Official Implementation of FastKV: Decoupling of Context Reduction and KV Cache Compression for Prefill-Decoding Acceleration☆29Nov 22, 2025Updated 3 months ago
- MiSS is a novel PEFT method that features a low-rank structure but introduces a new update mechanism distinct from LoRA, achieving an exc…☆31Jan 28, 2026Updated last month
- ☆60Jan 12, 2026Updated last month
- ☆32Nov 11, 2024Updated last year
- [ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…☆67Apr 15, 2024Updated last year
- Official Repo for SparseLLM: Global Pruning of LLMs (NeurIPS 2024)☆67Mar 27, 2025Updated 11 months ago
- Unofficial implementations of block/layer-wise pruning methods for LLMs.☆77Apr 29, 2024Updated last year
- ☆41Mar 28, 2024Updated last year
- ☆33Dec 17, 2025Updated 2 months ago
- (ACL 2025 oral) SCOPE: Optimizing KV Cache Compression in Long-context Generation☆34May 28, 2025Updated 9 months ago
- RL with Experience Replay☆55Jul 27, 2025Updated 7 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆35Mar 7, 2025Updated 11 months ago
- [NeurIPS 2025] What Makes a Reward Model a Good Teacher? An Optimization Perspective☆42Sep 18, 2025Updated 5 months ago
- The official repository of "Improving Large Language Models via Fine-grained Reinforcement Learning with Minimum Editing Constraint"☆39Jan 12, 2024Updated 2 years ago
- [EMNLP 2024] RoLoRA: Fine-tuning Rotated Outlier-free LLMs for Effective Weight-Activation Quantization☆38Sep 24, 2024Updated last year
- [ICLR 2024] This is the official PyTorch implementation of "QLLM: Accurate and Efficient Low-Bitwidth Quantization for Large Language Mod…☆39Mar 11, 2024Updated last year
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains☆79Jul 29, 2025Updated 7 months ago
- Compressed LLMs for Efficient Text Generation [ICLR'24 Workshop]☆90Sep 13, 2024Updated last year
- Linear Attention Sequence Parallelism (LASP)☆88Jun 4, 2024Updated last year
- Enable Next-sentence Prediction for Large Language Models with Faster Speed, Higher Accuracy and Longer Context☆41Aug 16, 2024Updated last year
- Generalization in Metric Learning: Should the Embedding Layer be the Embedding Layer?☆11Jan 3, 2019Updated 7 years ago
- ☆35Mar 25, 2024Updated last year
- ☆11Aug 20, 2025Updated 6 months ago
- Official implementation of the paper "Pretraining Language Models to Ponder in Continuous Space"☆25Jul 21, 2025Updated 7 months ago
- Code for utilising VAE as means of doing exact MCMC inference in complex high-dimensional space☆14Jun 20, 2023Updated 2 years ago
- ☆12Aug 6, 2024Updated last year
- The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"☆41Oct 11, 2024Updated last year
- The official code for "Advancing Multimodal Large Language Models with Quantization-Aware Scale Learning for Efficient Adaptation" | [MM2…☆14Dec 7, 2024Updated last year
- Code for paper "Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion"☆14Mar 28, 2024Updated last year
- [ICLR 2025 SynthData Workshop Spotlight] Empowering LLMs in Decision Games through Algorithmic Data Synthesis☆26Apr 27, 2025Updated 10 months ago
- Reference implementation of models from Nyonic Model Factory☆12May 13, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated 11 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…☆46Jun 30, 2024Updated last year