Official PyTorch Implementation of EMoE: Unlocking Emergent Modularity in Large Language Models [main conference @ NAACL2024]
☆39May 28, 2024Updated last year
Alternatives and similar repositories for EMoE
Users that are interested in EMoE are comparing it to the libraries listed below
Sorting:
- Official implementation of RMoE (Layerwise Recurrent Router for Mixture-of-Experts)☆29Aug 4, 2024Updated last year
- [NeurIPS 2024] Efficiency for Free: Ideal Data Are Transportable Representations☆19Jan 19, 2025Updated last year
- Research Artifact For Our Submission To VLDB☆10Oct 27, 2021Updated 4 years ago
- Based on the R1-Zero method, using rule-based rewards and GRPO on the Code Contests dataset.☆18Apr 22, 2025Updated 10 months ago
- [COLM 2024] Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation☆15Jul 15, 2024Updated last year
- ☆143Jul 21, 2024Updated last year
- ☆15Oct 30, 2021Updated 4 years ago
- An NVIDIA AI Workbench example project for finetuning a Llama 3 8B Model☆22Apr 29, 2025Updated 10 months ago
- Instruct-tuning LLaMA on consumer hardware with machine-translated data☆19Apr 17, 2023Updated 2 years ago
- Repository for Sparse Universal Transformers☆20Oct 23, 2023Updated 2 years ago
- Code release for Dataless Knowledge Fusion by Merging Weights of Language Models (https://openreview.net/forum?id=FCnohuR6AnM)☆92Jul 25, 2023Updated 2 years ago
- ☆19Oct 31, 2022Updated 3 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Jul 19, 2024Updated last year
- ☆26May 30, 2023Updated 2 years ago
- ☆25Jun 29, 2025Updated 8 months ago
- [ICCV 2023 Oral] Official PyTorch implementation of our paper for semi-supervised continual learning "A soft nearest-neighbor framework f…☆25Dec 17, 2024Updated last year
- [NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models☆60Feb 7, 2025Updated last year
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆27Oct 28, 2024Updated last year
- Long Context Extension and Generalization in LLMs☆63Sep 21, 2024Updated last year
- [NeurIPS 2023] Github repository for "Composing Parameter-Efficient Modules with Arithmetic Operations"☆61Nov 26, 2023Updated 2 years ago
- A library for parameter-efficient and composable transfer learning for NLP with sparse fine-tunings.☆75Aug 9, 2024Updated last year
- ☆32Oct 30, 2023Updated 2 years ago
- Must-read papers and blogs about parametric knowledge mechanism in LLMs.☆35May 9, 2025Updated 10 months ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆34Oct 25, 2024Updated last year
- ☆28Nov 10, 2025Updated 3 months ago
- Official repository of "LiNeS: Post-training Layer Scaling Prevents Forgetting and Enhances Model Merging"☆32Nov 4, 2024Updated last year
- Adding new tasks to T0 without catastrophic forgetting☆33Oct 20, 2022Updated 3 years ago
- ☆274Oct 31, 2023Updated 2 years ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Oct 11, 2023Updated 2 years ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 5 months ago
- Official implementation of Neuronal Time-Invariant Representations (NeuPRINT), NeurIPS 2023☆10Apr 17, 2024Updated last year
- A Data-Driven Approach to Predict the Success of Bank Telemarketing☆10Apr 27, 2021Updated 4 years ago
- [ICLR 2025] DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference☆49Jun 17, 2025Updated 8 months ago
- This repository is the official implementation of Topology-Informed Graph Transformer (Choi et al., GRaM Workshop at ICML 2024).☆12Dec 28, 2024Updated last year
- A GameMaker Studio 2 plugin for the Newgrounds.io API☆11Jan 12, 2023Updated 3 years ago
- ☆92Dec 23, 2024Updated last year
- Data preparation code for Amber 7B LLM☆93May 10, 2024Updated last year
- [KDD Explore'24]Time Series Forecasting with LLMs: Understanding and Enhancing Model Capabilities☆17May 7, 2025Updated 10 months ago