AviSoori1x / makeMoELinks
From scratch implementation of a sparse mixture of experts language model inspired by Andrej Karpathy's makemore :)
☆727Updated 8 months ago
Alternatives and similar repositories for makeMoE
Users that are interested in makeMoE are comparing it to the libraries listed below
Sorting:
- Understanding R1-Zero-Like Training: A Critical Perspective☆1,023Updated 2 weeks ago
- A family of open-sourced Mixture-of-Experts (MoE) Large Language Models☆1,557Updated last year
- ☆946Updated 5 months ago
- [COLM 2025] LIMO: Less is More for Reasoning☆977Updated last week
- Reference implementation of Megalodon 7B model☆520Updated last month
- DataComp for Language Models☆1,324Updated 3 months ago
- Unleashing the Power of Reinforcement Learning for Math and Code Reasoners☆645Updated last month
- Implementation of the training framework proposed in Self-Rewarding Language Model, from MetaAI☆1,392Updated last year
- OLMoE: Open Mixture-of-Experts Language Models☆809Updated 4 months ago
- Memory optimization and training recipes to extrapolate language models' context length to 1 million tokens, with minimal hardware.☆735Updated 9 months ago
- Recipes to scale inference-time compute of open models☆1,101Updated last month
- Large Reasoning Models☆805Updated 7 months ago
- A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).☆868Updated 2 weeks ago
- Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model tra…☆162Updated last year
- Single File, Single GPU, From Scratch, Efficient, Full Parameter Tuning library for "RL for LLMs"☆497Updated last week
- FuseAI Project☆578Updated 5 months ago
- Code for Quiet-STaR☆735Updated 10 months ago
- MINT-1T: A one trillion token multimodal interleaved dataset.☆819Updated 11 months ago
- nanoGPT style version of Llama 3.1☆1,394Updated 11 months ago
- Minimal hackable GRPO implementation☆252Updated 5 months ago
- Muon is Scalable for LLM Training☆1,093Updated 3 months ago
- [ICLR 2025] Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing. Your efficient and high-quality synthetic data …☆728Updated 3 months ago
- ☆946Updated 5 months ago
- [NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…☆1,067Updated 3 weeks ago
- [ACL 2024] Progressive LLaMA with Block Expansion.☆505Updated last year
- The official implementation of Self-Play Fine-Tuning (SPIN)☆1,172Updated last year
- [ICML 2024] CLLMs: Consistency Large Language Models☆396Updated 8 months ago
- [NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward☆908Updated 5 months ago
- ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)☆973Updated 7 months ago
- [ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning☆621Updated last year