kamanphoebe / Look-into-MoEsView external linksLinks
[NAACL 2025] A Closer Look into Mixture-of-Experts in Large Language Models
☆60Feb 7, 2025Updated last year
Alternatives and similar repositories for Look-into-MoEs
Users that are interested in Look-into-MoEs are comparing it to the libraries listed below
Sorting:
- ☆91Aug 18, 2024Updated last year
- Paper list for the paper "Authorship Attribution in the Era of Large Language Models: Problems, Methodologies, and Challenges (SIGKDD Exp…☆18Dec 23, 2024Updated last year
- Code for the paper "Cottention: Linear Transformers With Cosine Attention"☆20Nov 15, 2025Updated 3 months ago
- Implementation for MomentumSMoE☆19Apr 19, 2025Updated 9 months ago
- ☆16Jul 23, 2024Updated last year
- [NeurIPS 2024] Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study☆59Nov 24, 2024Updated last year
- The open-source materials for paper "Sparsing Law: Towards Large Language Models with Greater Activation Sparsity".☆30Nov 12, 2024Updated last year
- ☆22Sep 2, 2025Updated 5 months ago
- 🚀 LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training☆91Dec 3, 2024Updated last year
- [NeurIPS-2024] 📈 Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies https://arxiv.org/abs/2407.13623☆89Sep 26, 2024Updated last year
- MoE-Visualizer is a tool designed to visualize the selection of experts in Mixture-of-Experts (MoE) models.☆16Apr 8, 2025Updated 10 months ago
- Public code release for the paper "Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training"☆11Oct 27, 2025Updated 3 months ago
- Improving transparency of large language models' reasoning☆14Nov 25, 2025Updated 2 months ago
- LongAttn :Selecting Long-context Training Data via Token-level Attention☆15Jul 16, 2025Updated 6 months ago
- This repository contains data, code and models for contextual noncompliance.☆25Jul 18, 2024Updated last year
- Implementation for the paper: CMoE: Fast Carving of Mixture-of-Experts for Efficient LLM Inference☆34Mar 6, 2025Updated 11 months ago
- Codebase for character-centric story understanding☆14Jan 20, 2022Updated 4 years ago
- Fast and Slow Generating: An Empirical Study on Large and Small Language Models Collaborative Decoding.☆13Nov 19, 2024Updated last year
- ☆20Oct 22, 2025Updated 3 months ago
- Code for paper: Unraveling the Shift of Visual Information Flow in MLLMs: From Phased Interaction to Efficient Inference☆12Jun 7, 2025Updated 8 months ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆14Feb 4, 2025Updated last year
- [ACL 2025] Analyzing LLMs' Multilingual Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations☆16Oct 18, 2025Updated 3 months ago
- ☆31Nov 11, 2024Updated last year
- [ICML 2024] When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models☆35Jun 12, 2024Updated last year
- [ICLR 2025] Weighted-Reward Preference Optimization for Implicit Model Fusion☆13Mar 17, 2025Updated 10 months ago
- Gemstones: A Model Suite for Multi-Faceted Scaling Laws (NeurIPS 2025)☆32Sep 28, 2025Updated 4 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆16Dec 19, 2024Updated last year
- ☆16Mar 1, 2025Updated 11 months ago
- This repository contains the code for the paper "TaylorShift: Shifting the Complexity of Self-Attention from Squared to Linear (and Back)…☆13Mar 17, 2025Updated 10 months ago
- ☆14Dec 28, 2022Updated 3 years ago
- Efficient Scaling laws and collaborative pretraining.☆20Sep 18, 2025Updated 4 months ago
- From Accuracy to Robustness: A Study of Rule- and Model-based Verifiers in Mathematical Reasoning.☆24Oct 7, 2025Updated 4 months ago
- ModuleFormer is a MoE-based architecture that includes two different types of experts: stick-breaking attention heads and feedforward exp…☆226Sep 18, 2025Updated 4 months ago
- Inference Code for Paper "Harder Tasks Need More Experts: Dynamic Routing in MoE Models"☆67Jul 30, 2024Updated last year
- BitLinear implementation☆35Jan 1, 2026Updated last month
- The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".☆188Nov 14, 2025Updated 3 months ago
- [EMNLP 2024] Quantize LLM to extremely low-bit, and finetune the quantized LLMs☆15Jul 18, 2024Updated last year
- The benchmark and datasets of the ICML 2024 paper "VisionGraph: Leveraging Large Multimodal Models for Graph Theory Problems in Visual C…☆17May 27, 2024Updated last year
- Information and artifacts for "LoRA Learns Less and Forgets Less" (TMLR, 2024)☆20Sep 27, 2024Updated last year