qiuzh20/EMoE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/qiuzh20/EMoE)

qiuzh20 / EMoE

Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]

☆39

Alternatives and similar repositories for EMoE

Users that are interested in EMoE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

thunlp / Modularity-Analysis
View on GitHub
[ACL 2023 Findings] Emergent Modularity in Pre-trained Transformers
☆26Jun 7, 2023Updated 3 years ago
kamanphoebe / Look-into-MoEs
View on GitHub
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆61Feb 7, 2025Updated last year
MTTeql / MT-Teql
View on GitHub
Research Artifact For Our Submission To VLDB
☆11Oct 27, 2021Updated 4 years ago
qiuzh20 / RMoE
View on GitHub
Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)
☆33Aug 4, 2024Updated last year
yasufumy / spider-schema-linking-dataset
View on GitHub
☆15Oct 30, 2021Updated 4 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
thunlp / MoEfication
View on GitHub
☆146Jul 21, 2024Updated 2 years ago
alon-albalak / online-data-mixing
View on GitHub
An implementation of online data mixing for the Pile dataset, based on the GPT-NeoX library.
☆14Jan 9, 2024Updated 2 years ago
SJTU-DENG-Lab / AdaMoE
View on GitHub
[Findings of EMNLP 2024] AdaMoE: Token-Adaptive Routing with Null Experts for Mixture-of-Experts Language Models
☆20Oct 2, 2024Updated last year
Hannibal046 / PlugLM
View on GitHub
[ACL2023] Source code for Decouple knowledge from paramters for plug-and-play language modeling
☆20Sep 18, 2023Updated 2 years ago
Raincleared-Song / ConPET
View on GitHub
Source code for a LoRA-based continual relation extraction method.
☆14Sep 25, 2023Updated 2 years ago
TsinghuaC3I / LLM4BioHypoGen
View on GitHub
[COLM 2024] Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation
☆15Jul 15, 2024Updated 2 years ago
shawntan / SUT
View on GitHub
Repository for Sparse Universal Transformers
☆20Oct 23, 2023Updated 2 years ago
peterljq / Tutorial-of-Data-Distillation-and-Condensation
View on GitHub
A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …
☆13Dec 1, 2022Updated 3 years ago
CodeLLM-Research / CodeJudge-Eval
View on GitHub
[COLING25] CodeJudge Eval: Can Large Language Models be Good Judges in Code Understanding?
☆12Dec 3, 2024Updated last year
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
phonism / CP-Zero
View on GitHub
Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.
☆18Apr 22, 2025Updated last year
Thomasyyj / LongBio-Benchmark
View on GitHub
A controlled benchmark on evaluating and studying the dynamics of Long Context Language Models
☆26Oct 17, 2025Updated 9 months ago
SalesforceAIResearch / FoFo
View on GitHub
☆27Jun 2, 2026Updated last month
tomsherborne / zx-parse
View on GitHub
Zero-Shot Cross-Lingual Semantic Parsing (Sherborne & Lapata, ACL 2022)
☆17May 16, 2022Updated 4 years ago
codecaution / EvoMoE
View on GitHub
☆21Oct 31, 2022Updated 3 years ago
LINs-lab / ReLA
View on GitHub
[NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations
☆19Jan 19, 2025Updated last year
circle-hit / MuCDN
View on GitHub
Code for COLING 2022 accepted paper titled "MuCDN: Mutual Conversational Detachment Network for Emotion Recognition in Multi-Party Conver…
☆10Jul 21, 2023Updated 3 years ago
LINs-lab / awesome_papers
View on GitHub
☆20May 28, 2025Updated last year
bloomberg / dataless-model-merging
View on GitHub
Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)
☆92Jul 25, 2023Updated 3 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
iai-group / sigir2019-table2vec
View on GitHub
☆16Oct 1, 2020Updated 5 years ago
behavioral-data / BLADE
View on GitHub
[EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science
☆35Oct 25, 2024Updated last year
causalNLP / amr_llm
View on GitHub
This repo explores how AMR to address tasks difficult for LLMs
☆13Jan 15, 2024Updated 2 years ago
UNITES-Lab / MC-SMoE
View on GitHub
[ICLR‘24 Spotlight] Code for the paper "Merge, Then Compress: Demystify Efficient SMoE with Hints from Its Routing Policy"
☆108Jun 20, 2025Updated last year
Cohere-Labs-Community / parameter-efficient-moe
View on GitHub
☆279Oct 31, 2023Updated 2 years ago
UlisseMini / procgen-tools
View on GitHub
Tools for running experiments on RL agents in procgen environments
☆20Apr 5, 2024Updated 2 years ago
ZeroYuHuang / Transformer-Patcher
View on GitHub
☆34Aug 5, 2023Updated 2 years ago
yfguo91 / Real-Spike
View on GitHub
Real Spike: Learning Real-valued Spikes for Spiking Neural Networks
☆11Jul 12, 2022Updated 4 years ago
varunnair18 / FISH
View on GitHub
Code for "Training Neural Networks with Fixed Sparse Masks" (NeurIPS 2021).
☆59Jan 14, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ThomasScialom / T0_continual_learning
View on GitHub
Adding new tasks to T0 without catastrophic forgetting
☆33Oct 20, 2022Updated 3 years ago
hkust-nlp / PEM_composition
View on GitHub
[NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"
☆61Nov 26, 2023Updated 2 years ago
SCIR-SC-Qiaoban-Team / FreeEvalLM
View on GitHub
[AAAI26] Trade-offs in Large Reasoning Models: An Empirical Analysis of Deliberative and Adaptive Reasoning over Foundational Capabilitie…
☆11Feb 7, 2026Updated 5 months ago
cambridgeltl / autopeft
View on GitHub
AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning (Zhou et al.; TACL 2024)
☆51Mar 17, 2024Updated 2 years ago
lizhe2016 / RFBFN
View on GitHub
☆13Apr 9, 2022Updated 4 years ago
RUCAIBox / Language-Specific-Neurons
View on GitHub
☆91Dec 23, 2024Updated last year
ArthurConmy / MishformerLens
View on GitHub
MishformerLens intends to be a drop-in replacement for TransformerLens that AST patches HuggingFace Transformers rather than implementing…
☆10Oct 7, 2024Updated last year