Antlera / nanoGPT-moeLinks
Enable moe for nanogpt.
☆34Updated last year
Alternatives and similar repositories for nanoGPT-moe
Users that are interested in nanoGPT-moe are comparing it to the libraries listed below
Sorting:
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆130Updated 9 months ago
- Token Omission Via Attention☆128Updated 11 months ago
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆155Updated 5 months ago
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆230Updated 2 months ago
- ☆202Updated 9 months ago
- Multipack distributed sampler for fast padding-free training of LLMs☆201Updated last year
- EvaByte: Efficient Byte-level Language Models at Scale☆109Updated 5 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 8 months ago
- Experiments on speculative sampling with Llama models☆128Updated 2 years ago
- ☆72Updated last year
- ☆53Updated 10 months ago
- ☆128Updated last year
- This is the official repository for Inheritune.☆113Updated 7 months ago
- A pipeline for LLM knowledge distillation☆109Updated 5 months ago
- Spherical Merge Pytorch/HF format Language Models with minimal feature loss.☆138Updated 2 years ago
- Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…☆158Updated last year
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆246Updated 7 months ago
- Understand and test language model architectures on synthetic tasks.☆226Updated this week
- Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"☆241Updated 3 months ago
- ☆122Updated 7 months ago
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆104Updated last year
- RWKV-7: Surpassing GPT☆95Updated 10 months ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆144Updated last year
- ☆196Updated 3 weeks ago
- ☆100Updated last year
- [ICML 2024] CLLMs: Consistency Large Language Models☆404Updated 10 months ago
- Small and Efficient Mathematical Reasoning LLMs☆72Updated last year
- Replicating O1 inference-time scaling laws☆90Updated 9 months ago
- The official repo for "LLoCo: Learning Long Contexts Offline"☆116Updated last year
- The simplest, fastest repository for training/finetuning medium-sized GPTs.☆164Updated 3 months ago