OpenSparseLLMs / LLaMA-MoE-v2Links
π LLaMA-MoE v2: Exploring Sparsity of LLaMA from Perspective of Mixture-of-Experts with Post-Training
β91Updated last year
Alternatives and similar repositories for LLaMA-MoE-v2
Users that are interested in LLaMA-MoE-v2 are comparing it to the libraries listed below
Sorting:
- CoT-Valve: Length-Compressible Chain-of-Thought Tuningβ88Updated 10 months ago
- β114Updated 3 months ago
- [ICML 2025] M-STAR (Multimodal Self-Evolving TrAining for Reasoning) Project. Diving into Self-Evolving Training for Multimodal Reasoningβ70Updated 5 months ago
- [EMNLP 2025] TokenSkip: Controllable Chain-of-Thought Compression in LLMsβ197Updated last month
- [EMNLP 2024 Findingsπ₯] Official implementation of ": LOOK-M: Look-Once Optimization in KV Cache for Efficient Multimodal Long-Context Inβ¦β104Updated last year
- Laser: Learn to Reason Efficiently with Adaptive Length-based Reward Shapingβ61Updated 7 months ago
- [ICLR 2025] Dynamic Mixture of Experts: An Auto-Tuning Approach for Efficient Transformer Modelsβ152Updated 5 months ago
- Model merging is a highly efficient approach for long-to-short reasoning.β94Updated 2 months ago
- Open-Pandora: On-the-fly Control Video Generationβ35Updated last year
- β140Updated 3 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"β28Updated 6 months ago
- β126Updated 6 months ago
- [EMNLP 2025] LightThinker: Thinking Step-by-Step Compressionβ126Updated 8 months ago
- β46Updated 8 months ago
- [ACL' 25] The official code repository for PRMBench: A Fine-grained and Challenging Benchmark for Process-Level Reward Models.β86Updated 10 months ago
- β136Updated 9 months ago
- This repo contains evaluation code for the paper "MileBench: Benchmarking MLLMs in Long Context"β35Updated last year
- [ICML'25] Our study systematically investigates massive values in LLMs' attention mechanisms. First, we observe massive values are concenβ¦β85Updated 6 months ago
- [NeurIPS 2025] NoisyRollout: Reinforcing Visual Reasoning with Data Augmentationβ102Updated 3 months ago
- β175Updated 3 weeks ago
- [NeurIPS 2025] Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chainsβ65Updated 5 months ago
- Official repository for paper "DeepCritic: Deliberate Critique with Large Language Models"β40Updated 6 months ago
- ACL'2025: SoftCoT: Soft Chain-of-Thought for Efficient Reasoning with LLMs. and preprint: SoftCoT++: Test-Time Scaling with Soft Chain-ofβ¦β73Updated 7 months ago
- dParallel: Learnable Parallel Decoding for dLLMsβ51Updated 2 months ago
- [ICML 2024] Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibratiβ¦β46Updated last year
- paper list, tutorial, and nano code snippet for Diffusion Large Language Models.β148Updated 6 months ago
- Extrapolating RLVR to General Domains without Verifiersβ184Updated 4 months ago
- Official Repository of LatentSeekβ71Updated 6 months ago
- β32Updated last month
- [ICLR 2025] When Attention Sink Emerges in Language Models: An Empirical View (Spotlight)β151Updated 5 months ago