facebookresearch / chaiLinks
CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.
☆22Updated last year
Alternatives and similar repositories for chai
Users that are interested in chai are comparing it to the libraries listed below
Sorting:
- Official code for the paper "Attention as a Hypernetwork"☆47Updated last year
- JAX Scalify: end-to-end scaled arithmetics☆18Updated last year
- ☆18Updated 9 months ago
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆40Updated 2 years ago
- Triton implement of bi-directional (non-causal) linear attention☆65Updated last week
- Official code implementation for the work Preference Alignment with Flow Matching (NeurIPS 2024)☆66Updated last year
- Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam☆85Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Updated last year
- Here we will test various linear attention designs.☆62Updated last year
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆20Updated last year
- Repository for "TESS-2: A Large-Scale, Generalist Diffusion Language Model"☆54Updated 11 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆31Updated 2 years ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Updated last year
- Scaling Sparse Fine-Tuning to Large Language Models☆18Updated 2 years ago
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆14Updated 10 months ago
- 32 times longer context window than vanilla Transformers and up to 4 times longer than memory efficient Transformers.☆50Updated 2 years ago
- Awesome Triton Resources☆39Updated 9 months ago
- [ICLR 2025 & COLM 2025] Official PyTorch implementation of the Forgetting Transformer and Adaptive Computation Pruning☆137Updated last month
- Exploration into the proposed "Self Reasoning Tokens" by Felipe Bonetto☆57Updated last year
- ☆17Updated 8 months ago
- Implementation of 2-simplicial attention proposed by Clift et al. (2019) and the recent attempt to make practical in Fast and Simplex, Ro…☆46Updated 5 months ago
- Tiny re-implementation of MDM in style of LLaDA and nano-gpt speedrun☆56Updated 11 months ago
- PyTorch implementation of StableMask (ICML'24)☆15Updated last year
- My attempt to improve the speed of the newton schulz algorithm, starting from the dion implementation.☆29Updated 2 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Updated last year
- The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"☆45Updated 2 years ago
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆75Updated last year
- ☆18Updated last year
- ☆34Updated last year
- Implementation of Hyena Hierarchy in JAX☆10Updated 2 years ago