Antlera / nanoGPT-moe
Enable moe for nanogpt.
☆21Updated last year
Alternatives and similar repositories for nanoGPT-moe:
Users that are interested in nanoGPT-moe are comparing it to the libraries listed below
- Token Omission Via Attention☆122Updated 3 months ago
- ☆30Updated 4 months ago
- Repository for the paper Stream of Search: Learning to Search in Language☆119Updated 5 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆111Updated last month
- RWKV-7: Surpassing GPT☆71Updated 2 months ago
- Replicating O1 inference-time scaling laws☆70Updated last month
- ☆58Updated 8 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆41Updated 4 months ago
- ☆21Updated last week
- Small and Efficient Mathematical Reasoning LLMs☆71Updated 11 months ago
- ☆124Updated 11 months ago
- ☆43Updated 2 months ago
- Plug in & Play Pytorch Implementation of the paper: "Evolutionary Optimization of Model Merging Recipes" by Sakana AI☆26Updated 2 months ago
- This is the official repository for Inheritune.☆109Updated 3 months ago
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆50Updated 9 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆30Updated 3 months ago
- Understand and test language model architectures on synthetic tasks.☆175Updated this week
- Certified Reasoning with Language Models☆30Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆204Updated 3 weeks ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆34Updated 8 months ago
- Code for PHATGOOSE introduced in "Learning to Route Among Specialized Experts for Zero-Shot Generalization"☆80Updated 10 months ago
- Simple and efficient pytorch-native transformer training and inference (batched)☆66Updated 9 months ago
- Mixture of A Million Experts☆33Updated 5 months ago
- ☆69Updated 5 months ago
- ☆65Updated 6 months ago
- ☆180Updated this week
- ☆62Updated 3 months ago
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆101Updated this week
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆115Updated 4 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆93Updated 6 months ago