nbasyl / DoRALinks

Official implementation of "DoRA: Weight-Decomposed Low-Rank Adaptation"

☆124

Alternatives and similar repositories for DoRA

Users that are interested in DoRA are comparing it to the libraries listed below

Sorting:

SalesforceAIResearch / GemFilter
☆82Updated 6 months ago
yegcjs / DiffusionLLM
Code for paper "Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning"
☆83Updated last year
thu-ml / ReMoE
[ICLR2025] Codebase for "ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing", built on Megatron-LM.
☆85Updated 7 months ago
shaochenze / PatchTrain
Code for paper "Patch-Level Training for Large Language Models"
☆86Updated 8 months ago
thu-ml / low-bit-optimizers
Low-bit optimizers for PyTorch
☆130Updated last year
wuhy68 / Parameter-Efficient-MoE
Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks (EMNLP'24)
☆146Updated 10 months ago
ldery / Bonsai
Code for "Everybody Prune Now: Structured Pruning of LLMs with only Forward Passes"
☆28Updated last year
locuslab / massive-activations
Code accompanying the paper "Massive Activations in Large Language Models"
☆173Updated last year
UNITES-Lab / MC-SMoE
[ICLR 2024 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆88Updated last month
jxiw / MambaInLlama
[NeurIPS 2024] Official Repository of The Mamba in the Llama: Distilling and Accelerating Hybrid Models
☆226Updated 3 months ago
itsnamgyu / block-transformer
Block Transformer: Global-to-Local Language Modeling for Fast Inference (NeurIPS 2024)
☆160Updated 3 months ago
YuchuanTian / RethinkTinyLM
[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”
☆122Updated 6 months ago
sail-sg / scaling-with-vocab
[NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623
☆86Updated 10 months ago
yxli2123 / LoftQ
☆223Updated last year
YuchuanTian / DiJiang
[ICML'24 Oral] The official code of "DiJiang: Efficient Large Language Models through Compact Kernelization", a novel DCT-based linear at…
☆102Updated last year
astramind-ai / Mixture-of-depths
Unofficial implementation for the paper "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆167Updated last year
nikhilgsh / loraplus
☆220Updated last year
snu-mllab / Context-Memory
Pytorch implementation for "Compressed Context Memory For Online Language Model Interaction" (ICLR'24)
☆61Updated last year
haonan3 / AnchorContext
AnchorAttention: Improved attention for LLMs long-context training
☆212Updated 6 months ago
GraphPKU / PiSSA
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models(NeurIPS 2024 Spotlight)
☆367Updated last month
mrflogs / LoRA-Pro
Official code for our paper, "LoRA-Pro: Are Low-Rank Adapters Properly Optimized? "
☆127Updated 3 months ago
Lucky-Lance / Expert_Sparsity
[ACL 2024] Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
☆95Updated last year
htqin / IR-QLoRA
[ICML 2024 Oral] This project is the official implementation of our Accurate LoRA-Finetuning Quantization of LLMs via Information Retenti…
☆67Updated last year
FasterDecoding / BitDelta
☆199Updated 7 months ago
xichen-fy / Fira
Fira: Can We Achieve Full-rank Training of LLMs Under Low-rank Constraint?
☆112Updated 9 months ago
ShiZhengyan / InstructionModelling
[NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"
☆38Updated last year
Infini-AI-Lab / Multiverse
☆79Updated last week
kyegomez / Mixture-of-Depths
Implementation of the paper: "Mixture-of-Depths: Dynamically allocating compute in transformer-based language models"
☆103Updated this week
VITA-Group / Ms-PoE
"Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding" Zhenyu Zhang, Runjin Chen, Shiw…
☆29Updated last year
HanGuo97 / lq-lora
☆127Updated last year