facebookresearch / chaiLinks
CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.
☆22Updated 10 months ago
Alternatives and similar repositories for chai
Users that are interested in chai are comparing it to the libraries listed below
Sorting:
- Official code for the paper "Attention as a Hypernetwork"☆45Updated last year
 - [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Updated last year
 - JAX Scalify: end-to-end scaled arithmetics☆16Updated last year
 - [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆131Updated last month
 - Triton implement of bi-directional (non-causal) linear attention☆56Updated 9 months ago
 - [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆13Updated 7 months ago
 - ☆27Updated last year
 - The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆26Updated 11 months ago
 - Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆28Updated last year
 - Fork of Flame repo for training of some new stuff in development☆18Updated 3 weeks ago
 - ☆32Updated last year
 - Here we will test various linear attention designs.☆61Updated last year
 - ☆34Updated last year
 - Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆85Updated last year
 - ☆13Updated last year
 - The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆43Updated 2 years ago
 - Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆21Updated 2 weeks ago
 - [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆39Updated last year
 - Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper☆34Updated last year
 - Scaling Sparse Fine-Tuning to Large Language Models☆17Updated last year
 - Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
 - Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆47Updated 2 months ago
 - Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Updated last year
 - ☆18Updated last year
 - Explorations into adversarial losses on top of autoregressive loss for language modeling☆38Updated 8 months ago
 - Why Do We Need Weight Decay in Modern Deep Learning? [NeurIPS 2024]☆68Updated last year
 - Personal website☆16Updated 2 weeks ago
 - PyTorch implementation of StableMask (ICML'24)☆14Updated last year
 - ☆53Updated last year
 - Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆59Updated 11 months ago