BorealisAI / flora-optLinks

This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.

☆104

Alternatives and similar repositories for flora-opt

Users that are interested in flora-opt are comparing it to the libraries listed below

Sorting:

minyoungg / LTE
☆69Updated last year
FasterDecoding / BitDelta
☆203Updated 11 months ago
lucidrains / PEER-pytorch
Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind
☆131Updated 3 weeks ago
sebulo / LoQT
☆81Updated last year
HanGuo97 / lq-lora
☆128Updated last year
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆127Updated last year
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆147Updated last year
lucidrains / coconut-pytorch
Implementation of 🥥 Coconut, Chain of Continuous Thought, in Pytorch
☆180Updated 5 months ago
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆231Updated last month
nikhilgsh / loraplus
☆228Updated last year
SalesforceAIResearch / GemFilter
☆85Updated last week
jzhang38 / LongMamba
Some preliminary explorations of Mamba's context scaling.
☆217Updated last year
Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 11 months ago
kyleliang919 / Online-Subspace-Descent
[NeurIPS 2024] Low rank memory efficient optimizer without SVD
☆30Updated 4 months ago
huyphan168 / PEER
Mixture of A Million Experts
☆49Updated last year
JacobPfau / fillerTokens
☆75Updated last year
RobertCsordas / moeut
☆88Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 9 months ago
nbasyl / DoRA
Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"
☆124Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆173Updated 4 months ago
haonan3 / AnchorContext
AnchorAttention: Improved attention for LLMs long-context training
☆213Updated 10 months ago
VITA-Group / WeLore
[ICML 2025] From Low Rank Gradient Subspace Stabilization to Low-Rank Weights: Observations, Theories and Applications
☆51Updated 3 weeks ago
ScalingIntelligence / large_language_monkeys
☆108Updated last year
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆110Updated this week
RobertCsordas / moe_attention
Official repository for the paper "SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention"
☆101Updated last year
mcleish7 / arithmetic
Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)
☆194Updated last year
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆156Updated 7 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆162Updated 7 months ago
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆110Updated 7 months ago
wdlctc / mini-s
☆52Updated last year