zaydzuhri / pythia-mlkvLinks
Multi-Layer Key-Value sharing experiments on Pythia models
☆33Updated last year
Alternatives and similar repositories for pythia-mlkv
Users that are interested in pythia-mlkv are comparing it to the libraries listed below
Sorting:
- Lottery Ticket Adaptation☆39Updated 8 months ago
- Official Repository for Task-Circuit Quantization☆21Updated 2 months ago
- A repository for research on medium sized language models.☆78Updated last year
- ☆19Updated 4 months ago
- DPO, but faster 🚀☆43Updated 7 months ago
- ☆66Updated 4 months ago
- Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"☆20Updated 2 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Official Implementation of APB (ACL 2025 main Oral)☆29Updated 5 months ago
- XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.☆39Updated 10 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated last week
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆33Updated 11 months ago
- GoldFinch and other hybrid transformer components☆46Updated last year
- Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper☆37Updated 4 months ago
- A collection of tricks and tools to speed up transformer models☆169Updated last month
- Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…☆56Updated this week
- ☆37Updated last year
- ☆20Updated last year
- Linear Attention Sequence Parallelism (LASP)☆85Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆86Updated last month
- Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth☆137Updated this week
- Data preparation code for CrystalCoder 7B LLM☆45Updated last year
- RWKV-7: Surpassing GPT☆94Updated 8 months ago
- ☆51Updated last month
- Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…☆59Updated 9 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆34Updated 4 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆56Updated last year
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆98Updated 10 months ago
- https://x.com/BlinkDL_AI/status/1884768989743882276☆28Updated 2 months ago
- The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"☆84Updated last week