BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆82Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for flora-opt
- ☆63Updated 4 months ago
- ☆64Updated last month
- Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"☆93Updated last month
- ☆122Updated 10 months ago
- [NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models☆174Updated this week
- Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"☆122Updated 6 months ago
- ☆62Updated 3 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated 2 months ago
- Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)☆50Updated 7 months ago
- Token Omission Via Attention☆121Updated last month
- ☆153Updated 9 months ago
- This is the official repository for Inheritune.☆105Updated last month
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆112Updated 3 months ago
- ☆61Updated last week
- ☆103Updated last month
- Some preliminary explorations of Mamba's context scaling.☆191Updated 9 months ago
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆214Updated this week
- Language models scale reliably with over-training and on downstream tasks☆94Updated 7 months ago
- Code accompanying the paper "Massive Activations in Large Language Models"☆123Updated 8 months ago
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆84Updated this week
- [ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"☆64Updated 5 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆139Updated this week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated 2 months ago
- Code for reproducing our paper "Not All Language Model Features Are Linear"☆61Updated last week
- ☆200Updated 4 months ago
- Understand and test language model architectures on synthetic tasks.☆163Updated 6 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆113Updated 5 months ago
- ☆184Updated last month
- AnchorAttention: Improved attention for LLMs long-context training☆142Updated this week
- ☆49Updated 6 months ago