facebookresearch / chaiLinks

CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.

☆17

Alternatives and similar repositories for chai

Users that are interested in chai are comparing it to the libraries listed below

Sorting:

bentherien / mu_learned_optimization
[Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers
☆13Updated 3 months ago
BlinkDL / LinearAttentionArena
Here we will test various linear attention designs.
☆60Updated last year
glassroom / heinsen_attention
Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)
☆24Updated last year
graphcore-research / jax-scalify
JAX Scalify: end-to-end scaled arithmetics
☆16Updated 8 months ago
SmerkyG / GoldFinch-paper
GoldFinch and other hybrid transformer components
☆10Updated this week
RAIVNLab / MatFormer-OLMo
Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…
☆24Updated last year
sekstini / basedxl
☆18Updated last year
eth-easl / mixtera
A lightweight, user-friendly data-plane for LLM training.
☆20Updated last week
lucidrains / simplicial-attention
Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…
☆25Updated this week
smonsays / hypernetwork-attention
Official code for the paper "Attention as a Hypernetwork"
☆39Updated last year
Infini-AI-Lab / Kinetics
Kinetics: Rethinking Test-Time Scaling Laws
☆30Updated 3 weeks ago
akhilkedia / TranformersGetStable
[ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"
☆10Updated 11 months ago
thunlp / SparsingLaw
The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".
☆23Updated 8 months ago
EleutherAI / training-jacobian
☆23Updated 7 months ago
nikhilvyas / SOAP_MUON
Combining SOAP and MUON
☆16Updated 5 months ago
Ryu1845 / hyena-jax
Implementation of Hyena Hierarchy in JAX
☆10Updated 2 years ago
cloneofsimo / zeroshampoo
☆34Updated 10 months ago
lucidrains / transformer-lm-gan
Explorations into adversarial losses on top of autoregressive loss for language modeling
☆37Updated 4 months ago
horseee / LLaMA-Pruning
Structural Pruning for LLaMA
☆54Updated 2 years ago
RobertCsordas / moe
Official repository for the paper "Approximating Two-Layer Feedforward Networks for Efficient Transformers"
☆38Updated last month
cloneofsimo / fim-llama-deepspeed
☆31Updated last year
thecharlieblake / lovely-llama
An implementation of the Llama architecture, to instruct and delight
☆21Updated last month
facebookresearch / DIG-In
This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.
☆20Updated last year
hwang595 / Cuttlefish
The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"
☆45Updated 2 years ago
SonicCodes / subcloning
implementation of https://arxiv.org/pdf/2312.09299
☆21Updated last year
belindal / state-tracking
Code and data for paper "(How) do Language Models Track State?"
☆14Updated 3 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
IST-DASLab / QIGen
Repository for CPU Kernel Generation for LLM Inference
☆26Updated last year
VITA-Group / WeLore
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…
☆47Updated 2 months ago
zhixuan-lin / forgetting-transformer
[ICLR 2025] Official PyTorch implementation of "Forgetting Transformer: Softmax Attention with a Forget Gate"
☆115Updated last week