xichen-fy / FiraLinks

Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?

☆115

Alternatives and similar repositories for Fira

Users that are interested in Fira are comparing it to the libraries listed below

Sorting:

thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆96Updated 9 months ago
SalesforceAIResearch / GemFilter
☆86Updated 8 months ago
Infini-AI-Lab / Multiverse
☆96Updated last month
OpenSparseLLMs / Linear-MoE
☆119Updated 4 months ago
OpenSparseLLMs / Linearization
☆61Updated 3 months ago
OpenSparseLLMs / MoM
☆101Updated 3 weeks ago
UNITES-Lab / MC-SMoE
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆95Updated 3 months ago
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆102Updated last year
Infini-AI-Lab / S2FT
☆19Updated 9 months ago
efficientscaling / Z1
[EMNLP 2025 Industry] Repo for "Z1: Efficient Test-time Scaling with Code"
☆64Updated 6 months ago
sail-sg / SimLayerKV
The official implementation of paper: SimLayerKV: A Simple Framework for Layer-Level KV Cache Reduction.
☆49Updated 11 months ago
pprp / Pruner-Zero
[ICML24] Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
☆94Updated 10 months ago
haonan3 / AnchorContext
AnchorAttention: Improved attention for LLMs long-context training
☆213Updated 8 months ago
Kwai-Klear / KlearReasoner
Klear-Reasoner: Advancing Reasoning Capability via Gradient-Preserving Clipping Policy Optimization
☆73Updated 2 weeks ago
TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆99Updated 4 months ago
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆113Updated 4 months ago
Multiverse4FM / Multiverse
☆77Updated 3 months ago
zhijie-group / SIFT
SIFT: Grounding LLM Reasoning in Contexts via Stickers
☆58Updated 7 months ago
thu-nics / MoA
[CoLM'25] The official implementation of the paper <MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression>
☆146Updated 3 months ago
qiuzh20 / gated_attention
The official implementation for [NeurIPS2025 Oral] Gated Attention for Large Language Models: Non-linearity, Sparsity, and Attention-Sink…
☆89Updated 3 weeks ago
horseee / CoT-Valve
CoT-Valve: Length-Compressible Chain-of-Thought Tuning
☆86Updated 7 months ago
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆172Updated last year
PKU-ML / LongPPL
Code for ICLR 2025 Paper "What is Wrong with Perplexity for Long-context Language Modeling?"
☆102Updated this week
abdelfattah-lab / SplitReason
☆18Updated 3 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆88Updated 10 months ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆123Updated 8 months ago
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
sail-sg / LongSpec
LongSpec: Long-Context Lossless Speculative Decoding with Efficient Drafting and Verification
☆64Updated 2 months ago
Infini-AI-Lab / gsm_infinite
☆54Updated 4 months ago