facebookresearch / chaiLinks
CHAI is a library for dynamic pruning of attention heads for efficient LLM inference.
☆16Updated 5 months ago
Alternatives and similar repositories for chai
Users that are interested in chai are comparing it to the libraries listed below
Sorting:
- JAX Scalify: end-to-end scaled arithmetics☆16Updated 7 months ago
- ☆32Updated last year
- ☆31Updated last year
- GoldFinch and other hybrid transformer components☆10Updated 3 weeks ago
- Experimental scripts for researching data adaptive learning rate scheduling.☆23Updated last year
- Here we will test various linear attention designs.☆59Updated last year
- [NeurIPS 2023] Sparse Modular Activation for Efficient Sequence Modeling☆36Updated last year
- ☆17Updated last year
- Learning Compiler Pass Orders using Coreset and Normalized Value Prediction. (ICML 2023)☆19Updated last year
- ☆15Updated 2 months ago
- Implementation of the LDP module block in PyTorch and Zeta from the paper: "MobileVLM: A Fast, Strong and Open Vision Language Assistant …☆15Updated last year
- Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper☆33Updated 8 months ago
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆22Updated last year
- Open-Source LLM Coders with Co-Evolving Reinforcement Learning☆40Updated this week
- [Oral; Neurips OPT2024 ] μLO: Compute-Efficient Meta-Generalization of Learned Optimizers☆12Updated 2 months ago
- GoldFinch and other hybrid transformer components☆45Updated 10 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆28Updated last year
- From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients. Ajay Jaiswal, Lu Yin, Zhenyu Zhang, Shiwei Liu,…☆47Updated last month
- Multi-Layer Key-Value sharing experiments on Pythia models☆33Updated 11 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Updated 10 months ago
- ☆13Updated 9 months ago
- Training hybrid models for dummies.☆21Updated 4 months ago
- implementation of https://arxiv.org/pdf/2312.09299☆20Updated 11 months ago
- An open source replication of the stawberry method that leverages Monte Carlo Search with PPO and or DPO☆29Updated this week
- Reference implementation of "Softmax Attention with Constant Cost per Token" (Heinsen, 2024)☆24Updated last year
- Official Code Implementation for 'A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models'☆19Updated 10 months ago
- Official code for the paper "Attention as a Hypernetwork"☆36Updated 11 months ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.☆20Updated last year
- My Implementation of Q-Sparse: All Large Language Models can be Fully Sparsely-Activated☆32Updated 9 months ago
- Official repository for ICML 2024 paper "MoRe Fine-Tuning with 10x Fewer Parameters"☆18Updated 2 weeks ago