Antlera / nanoGPT-moe
Enable moe for nanogpt.
☆24Updated last year
Alternatives and similar repositories for nanoGPT-moe:
Users that are interested in nanoGPT-moe are comparing it to the libraries listed below
- Token Omission Via Attention☆124Updated 5 months ago
- ☆60Updated 11 months ago
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆43Updated 7 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆102Updated 9 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆85Updated 2 weeks ago
- Repository for the paper Stream of Search: Learning to Search in Language☆142Updated 2 months ago
- ☆67Updated 8 months ago
- A repository for research on medium sized language models.☆76Updated 10 months ago
- ☆31Updated 2 months ago
- Small and Efficient Mathematical Reasoning LLMs☆71Updated last year
- NeurIPS 2024 tutorial on LLM Inference☆39Updated 3 months ago
- ☆74Updated 7 months ago
- Pytorch implementation of the PEER block from the paper, Mixture of A Million Experts, by Xu Owen He at Deepmind☆122Updated 7 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆26Updated 6 months ago
- RWKV-7: Surpassing GPT☆82Updated 4 months ago
- A pure and fast NumPy implementation of Mamba with cache support.☆17Updated 9 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆186Updated 4 months ago
- ☆182Updated this week
- A single repo with all scripts and utils to train / fine-tune the Mamba model with or without FIM☆54Updated 11 months ago
- Intelligent Go-Explore: Standing on the Shoulders of Giant Foundation Models☆54Updated last month
- Lottery Ticket Adaptation☆39Updated 4 months ago
- Like ARC, but code to generate visual puzzles. 1D puzzles first.☆17Updated 7 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆141Updated 6 months ago
- ☆18Updated 6 months ago
- ☆50Updated 5 months ago
- ☆37Updated 6 months ago
- The simplest, fastest repository for training/finetuning medium-sized xLSTMs.☆42Updated 10 months ago
- Understand and test language model architectures on synthetic tasks.☆185Updated 3 weeks ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆41Updated 9 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆125Updated 4 months ago