facebookresearch / chaiLinks
CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.
☆19Updated 9 months ago
Alternatives and similar repositories for chai
Users that are interested in chai are comparing it to the libraries listed below
Sorting:
- Official code for the paper "Attention as a Hypernetwork"☆41Updated last year
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Updated last year
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆25Updated 10 months ago
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 10 months ago
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆45Updated 2 years ago
- Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper☆33Updated 11 months ago
- Triton implement of bi-directional (non-causal) linear attention☆54Updated 7 months ago
- Scaling Sparse Fine-Tuning to Large Language Models☆17Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆33Updated last year
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Updated last year
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆51Updated 6 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆125Updated last month
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Updated last week
- ☆13Updated last year
- ☆27Updated last year
- Here we will test various linear attention designs.☆62Updated last year
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆56Updated last year
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆58Updated 10 months ago
- ☆32Updated 10 months ago
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆85Updated last year
- ☆53Updated last year
- The official repo of continuous speculative decoding☆27Updated 5 months ago
- Official PyTorch implementation of "LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging" (ICML 2024)☆30Updated last year
- Explorations into the recently proposed Taylor Series Linear Attention☆100Updated last year
- Implementation of the proposed DeepCrossAttention by Heddes et al at Google research, in Pytorch☆92Updated 6 months ago
- Personal website☆15Updated 2 months ago
- Exploring an idea where one forgets about efficiency and carries out attention across each edge of the nodes (tokens)☆53Updated 5 months ago
- ☆34Updated last year
- Implementation of Diffusion Transformers and Rectified Flow in Jax☆25Updated last year