UNITES-Lab / Lingual-SMoELinks

[ICLR 2024] Code for the paper "Sparse MoE with Language-Guided Routing for Multilingual Machine Translation"

☆8

Alternatives and similar repositories for Lingual-SMoE

Users that are interested in Lingual-SMoE are comparing it to the libraries listed below

Sorting:

Clin0212 / HydraLoRA
[NeurIPS'24 Oral] HydraLoRA: An Asymmetric LoRA Architecture for Efficient Fine-Tuning
☆220Updated 8 months ago
chuanyang-Zheng / DAPE
The this is the official implementation of "DAPE: Data-Adaptive Positional Encoding for Length Extrapolation"
☆39Updated 10 months ago
DreamLM / DreamOn
Diffusion Language Models For Code Infilling Beyond Fixed-size Canvas
☆54Updated 2 weeks ago
RUCBM / DeepCritic
Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"
☆33Updated last month
TUDB-Labs / MoE-PEFT
An Efficient LLM Fine-Tuning Factory Optimized for MoE PEFT
☆108Updated 5 months ago
GATECH-EIC / ACT
[ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibrati…
☆40Updated last year
Raibows / CREAM
Code for "CREAM: Consistency Regularized Self-Rewarding Language Models", ICLR 2025.
☆25Updated 5 months ago
Chaos96 / fourierft
☆147Updated 11 months ago
TUDB-Labs / MixLoRA
State-of-the-art Parameter-Efficient MoE Fine-tuning Method
☆176Updated 11 months ago
DavidFanzz / SCMoE
☆26Updated last year
yczhou001 / Awesome-Diffusion-LLM
paper list, tutorial, and nano code snippet for Diffusion Large Language Models.
☆98Updated last month
ShiZhengyan / DePT
[ICLR 2024] This is the repository for the paper titled "DePT: Decomposed Prompt Tuning for Parameter-Efficient Fine-tuning"
☆95Updated last year
xuyige / SoftCoT
ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-of…
☆38Updated 2 months ago
jianghoucheng / AlphaEdit
AlphaEdit: Null-Space Constrained Knowledge Editing for Language Models, ICLR 2025 (Outstanding Paper)
☆298Updated last month
HKUNLP / diffusion-of-thoughts
[NeurIPS 2024] Code for the paper "Diffusion of Thoughts: Chain-of-Thought Reasoning in Diffusion Language Models"
☆173Updated 5 months ago
ML-GSAI / Diffusion-LLM-Papers
A Collection of Papers on Diffusion Language Models
☆98Updated last week
ML-GSAI / LLaDA-V
☆194Updated last week
zefang-liu / AdaMoLE
AdaMoLE: Adaptive Mixture of LoRA Experts
☆34Updated 10 months ago
TsinghuaC3I / SoRA
[EMNLP 2023, Main Conference] Sparse Low-rank Adaptation of Pre-trained Language Models
☆81Updated last year
GCYZSL / MoLA
☆149Updated last year
MingyuJ666 / Rope_with_LLM
[ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concen…
☆77Updated last month
waltonfuture / Diff-eRank
[NeurIPS 2024] A Novel Rank-Based Metric for Evaluating Large Language Models
☆51Updated 2 months ago
LINs-lab / DynMoE
[ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Models
☆124Updated last month
zitian-gao / one-shot-em
One-shot Entropy Minimization
☆175Updated 2 months ago
hanyang1999 / discrete-diffusion-papers
A collection of papers on discrete diffusion models
☆156Updated last month
UKPLab / iclr2024-model-merging
This is the repository for "Model Merging by Uncertainty-Based Gradient Matching", ICLR 2024.
☆28Updated last year
haonan3 / AnchorContext
AnchorAttention: Improved attention for LLMs long-context training
☆212Updated 6 months ago
ThreeSR / Awesome-Inference-Time-Scaling
Paper List of Inference/Test Time Scaling/Computing
☆289Updated last month
Nardien / KARD
Official Code Repository for the paper "Knowledge-Augmented Reasoning Distillation for Small Language Models in Knowledge-intensive Tasks…
☆40Updated 8 months ago
Dereck0602 / Awesome_Test_Time_LLMs
☆118Updated 5 months ago