JT-Ushio / MHA2MLALinks
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆166Updated last week
Alternatives and similar repositories for MHA2MLA
Users that are interested in MHA2MLA are comparing it to the libraries listed below
Sorting:
- Parallel Scaling Law for Language Model — Beyond Parameter and Inference Time Scaling☆345Updated 2 weeks ago
- ☆188Updated last month
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning☆213Updated 2 weeks ago
- Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models☆132Updated 11 months ago
- ☆201Updated 3 months ago
- Exploring the Limit of Outcome Reward for Learning Mathematical Reasoning☆179Updated 2 months ago
- Efficient triton implementation of Native Sparse Attention.☆155Updated last week
- Code for "Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate"☆151Updated last month
- TransMLA: Multi-Head Latent Attention Is All You Need☆284Updated this week
- [ICLR 2025] COAT: Compressing Optimizer States and Activation for Memory-Efficient FP8 Training☆203Updated 2 weeks ago
- A highly capable 2.4B lightweight LLM using only 1T pre-training data with all details.☆182Updated this week
- [ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation☆101Updated 2 weeks ago
- Implementation of the LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper☆136Updated 10 months ago
- 🔥 A minimal training framework for scaling FLA models☆146Updated 2 weeks ago
- ☆77Updated last month
- A lightweight reproduction of DeepSeek-R1-Zero with indepth analysis of self-reflection behavior.☆239Updated last month
- Rethinking RL Scaling for Vision Language Models: A Transparent, From-Scratch Framework and Comprehensive Evaluation Scheme☆127Updated last month
- An Open Math Pre-trainng Dataset with 370B Tokens.☆87Updated last month
- CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models☆126Updated last week
- ☆293Updated this week
- A Comprehensive Survey on Long Context Language Modeling☆147Updated 2 weeks ago
- [ICLR 2025] DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads☆463Updated 3 months ago
- [ICML 2025] Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale☆248Updated 2 weeks ago
- ☆223Updated this week
- Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"☆160Updated 11 months ago
- Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"☆184Updated 2 months ago
- [ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.☆74Updated 5 months ago
- ☆80Updated 2 weeks ago
- Simple extension on vLLM to help you speed up reasoning model without training.☆152Updated this week
- Chain of Experts (CoE) enables communication between experts within Mixture-of-Experts (MoE) models☆167Updated last week