hwang595 / Cuttlefish
The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"
☆43Updated last year
Alternatives and similar repositories for Cuttlefish:
Users that are interested in Cuttlefish are comparing it to the libraries listed below
- ☆50Updated last year
- ☆36Updated 6 months ago
- Stick-breaking attention☆44Updated last month
- Revisiting Efficient Training Algorithms For Transformer-based Language Models (NeurIPS 2023)☆79Updated last year
- Sparse Backpropagation for Mixture-of-Expert Training☆28Updated 8 months ago
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆29Updated 8 months ago
- [ICML‘2024] "LoCoCo: Dropping In Convolutions for Long Context Compression", Ruisi Cai, Yuandong Tian, Zhangyang Wang, Beidi Chen☆16Updated 5 months ago
- ☆79Updated last year
- Linear Attention Sequence Parallelism (LASP)☆79Updated 9 months ago
- ☆100Updated 11 months ago
- [ICML 2024] SPP: Sparsity-Preserved Parameter-Efficient Fine-Tuning for Large Language Models☆18Updated 9 months ago
- Official code for the paper "Attention as a Hypernetwork"☆24Updated 8 months ago
- [ICML 2024] Junk DNA Hypothesis: A Task-Centric Angle of LLM Pre-trained Weights through Sparsity; Lu Yin*, Ajay Jaiswal*, Shiwei Liu, So…☆16Updated 9 months ago
- [ACL 2024] RelayAttention for Efficient Large Language Model Serving with Long System Prompts☆38Updated last year
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆73Updated 8 months ago
- Inference Speed Benchmark for Learning to (Learn at Test Time): RNNs with Expressive Hidden States☆61Updated 7 months ago
- Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"☆27Updated 11 months ago
- "Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…☆26Updated 9 months ago
- ☆30Updated last year
- NAACL '24 (Best Demo Paper RunnerUp) / MlSys @ NeurIPS '23 - RedCoast: A Lightweight Tool to Automate Distributed Training and Inference☆64Updated 2 months ago
- Fast and memory-efficient exact attention☆64Updated this week
- Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"☆114Updated 11 months ago
- [ICLR 2023] "Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers" by Tianlong Chen*, Zhenyu Zhang*, Ajay Jaiswal…☆48Updated 2 years ago
- SQUEEZED ATTENTION: Accelerating Long Prompt LLM Inference☆42Updated 3 months ago
- Repo for ACL2023 Findings paper "Emergent Modularity in Pre-trained Transformers"☆22Updated last year
- 🔥 A minimal training framework for scaling FLA models☆73Updated this week
- A fusion of a linear layer and a cross entropy loss, written for pytorch in triton.☆62Updated 7 months ago
- Here we will test various linear attention designs.☆59Updated 10 months ago
- ☆37Updated last week