zaydzuhri / pythia-mlkvLinks

Multi-Layer Key-Value sharing experiments on Pythia models

☆33

Alternatives and similar repositories for pythia-mlkv

Users that are interested in pythia-mlkv are comparing it to the libraries listed below

Sorting:

kiddyboots216 / lottery-ticket-adaptation
Lottery Ticket Adaptation
☆39Updated 8 months ago
The-Inscrutable-X / TACQ
Official Repository for Task-Circuit Quantization
☆21Updated 2 months ago
TRI-ML / linear_open_lm
A repository for research on medium sized language models.
☆78Updated last year
ZihanWang314 / coeCheck
☆19Updated 4 months ago
frankxwang / dpo-prefix-sharing
DPO, but faster 🚀
☆43Updated 7 months ago
du-nlp-lab / MLR-Copilot
☆66Updated 4 months ago
jiwonsong-dev / ReasoningPathCompression
Official implementation of "Reasoning Path Compression: Compressing Generation Trajectories for Efficient LLM Reasoning"
☆20Updated 2 months ago
ElleLeonne / Lightning-ReLoRA
A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.
☆33Updated last year
thunlp / APB
Official Implementation of APB (ACL 2025 main Oral)
☆29Updated 5 months ago
xverse-ai / XVERSE-MoE-A36B
XVERSE-MoE-A36B: A multilingual large language model developed by XVERSE Technology Inc.
☆39Updated 10 months ago
samchaineau / llm_slerp_generation
Repo hosting codes and materials related to speeding LLMs' inference using token merging.
☆36Updated last week
nanowell / Q-Sparse-LLM
My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated
☆33Updated 11 months ago
recursal / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆46Updated last year
menhguin / minp_paper
Code Implementation, Evaluations, Documentation, Links and Resources for Min P paper
☆37Updated 4 months ago
OpenMachine-ai / transformer-tricks
A collection of tricks and tools to speed up transformer models
☆169Updated last month
kyegomez / Infini-attention
Implementation of the paper: "Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention" from Google in pyTO…
☆56Updated this week
Zyphra / Zyda_processing
☆37Updated last year
LLM360 / k2-data-prep
☆20Updated last year
OpenNLPLab / LASP
Linear Attention Sequence Parallelism (LASP)
☆85Updated last year
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆86Updated last month
woct0rdho / transformers-qwen3-moe-fused
Fused Qwen3 MoE layer for faster training, compatible with HF Transformers, LoRA, 4-bit quant, Unsloth
☆137Updated this week
LLM360 / crystalcoder-data-prep
Data preparation code for CrystalCoder 7B LLM
☆45Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆94Updated 8 months ago
Infini-AI-Lab / gsm_infinite
☆51Updated last month
BorealisAI / neuzip
Official repository for the paper "NeuZip: Memory-Efficient Training and Inference with Dynamic Compression of Neural Networks". This rep…
☆59Updated 9 months ago
NathanGodey / qfilters
Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)
☆34Updated 4 months ago
tanaymeh / mamba-train
A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM
☆56Updated last year
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆98Updated 10 months ago
RWKV / ZeroCoT
https://x.com/BlinkDL_AI/status/1884768989743882276
☆28Updated 2 months ago
StigLidu / DualDistill
The official implementation for paper "Agentic-R1: Distilled Dual-Strategy Reasoning"
☆84Updated last week