Caiyun-AI / MUDDFormerLinks

☆63

Alternatives and similar repositories for MUDDFormer

Users that are interested in MUDDFormer are comparing it to the libraries listed below

Sorting:

TsinghuaC3I / Fourier-Position-Embedding
[ICML 2025] Fourier Position Embedding: Enhancing Attention’s Periodic Extension for Length Generalization
☆75Updated last month
juzhengz / LoRI
[COLM 2025] LoRI: Reducing Cross-Task Interference in Multi-Task Low-Rank Adaptation
☆136Updated last week
maple-research-lab / SLOT
☆91Updated last month
Caiyun-AI / DCFormer
☆211Updated 5 months ago
THUDM / Awesome-Parameter-Efficient-Fine-Tuning-for-Foundation-Models
Parameter-Efficient Fine-Tuning for Foundation Models
☆73Updated 3 months ago
dhcode-cpp / NSA-pytorch
DeepSeek Native Sparse Attention pytorch implementation
☆73Updated 4 months ago
OpenNLPLab / lightning-attention
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models
☆322Updated 4 months ago
Chongjie-Si / Subspace-Tuning
A generalized framework for subspace tuning methods in parameter efficient fine-tuning.
☆147Updated 2 weeks ago
liangyuwang / zo2
ZO2 (Zeroth-Order Offloading): Full Parameter Fine-Tuning 175B LLMs with 18GB GPU Memory
☆148Updated 3 weeks ago
kyegomez / SwitchTransformers
Implementation of Switch Transformers from the paper: "Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficien…
☆110Updated 3 months ago
Chaos96 / fourierft
☆147Updated 10 months ago
inclusionAI / Ring
Ring is a reasoning MoE LLM provided and open-sourced by InclusionAI, derived from Ling.
☆85Updated 3 weeks ago
InternLM / Condor
[ACL 2025] An official pytorch implement of the paper: Condor: Enhance LLM Alignment with Knowledge-Driven Data Synthesis and Refinement
☆30Updated last month
mdy666 / Qwen-Native-Sparse-Attention
qwen-nsa
☆68Updated 3 months ago
jwzhanggy / tinyBIG
tinybig for deep function learning
☆61Updated last month
SkyworkAI / Skywork-Reward-V2
Scaling Preference Data Curation via Human-AI Synergy
☆69Updated last week
transformer-vq / transformer_vq
☆195Updated last year
pangu-tech / pangu-ultra
☆64Updated last month
waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆49Updated last month
Tencent / llm.hunyuan.T1
☆77Updated 3 months ago
lzhxmu / CPPO
CPPO: Accelerating the Training of Group Relative Policy Optimization-Based Reasoning Models
☆143Updated last month
bigai-nlco / TokenSwift
[ICML 2025] |TokenSwift: Lossless Acceleration of Ultra Long Sequence Generation
☆110Updated last month
JT-Ushio / MHA2MLA
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
☆178Updated 3 weeks ago
wutaiqiang / MoSLoRA
☆108Updated last year
pprp / Awesome-Efficient-MoE
Efficient Mixture of Experts for LLM Paper List
☆79Updated 7 months ago
Outsider565 / LoRA-GA
☆202Updated 8 months ago
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆99Updated last week
WailordHe / DenseSSM
A repository for DenseSSMs
☆87Updated last year
fxmeng / TransMLA
TransMLA: Multi-Head Latent Attention Is All You Need
☆327Updated last week
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆121Updated 6 months ago