Antlera / nanoGPT-moe
Enable moe for nanogpt.
☆21Updated 11 months ago
Related projects ⓘ
Alternatives and complementary repositories for nanoGPT-moe
- Token Omission Via Attention☆121Updated last month
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆104Updated last month
- Repository for the paper Stream of Search: Learning to Search in Language☆93Updated 3 months ago
- ☆63Updated 4 months ago
- ☆16Updated last month
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆80Updated 4 months ago
- Parameter-Efficient Sparsity Crafting From Dense to Mixture-of-Experts for Instruction Tuning on General Tasks☆129Updated 2 months ago
- ☆49Updated 6 months ago
- ☆62Updated 3 months ago
- ☆35Updated 3 weeks ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆87Updated 3 months ago
- Code repository for the c-BTM paper☆105Updated last year
- "Improving Mathematical Reasoning with Process Supervision" by OPENAI☆83Updated last week
- ☆15Updated last month
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆38Updated 2 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆128Updated last month
- Small and Efficient Mathematical Reasoning LLMs☆71Updated 9 months ago
- PyTorch implementation of models from the Zamba2 series.☆158Updated this week
- Official Implementation of "HMT: Hierarchical Memory Transformer for Long Context Language Processing"☆62Updated this week
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆42Updated 3 months ago
- Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"☆128Updated 11 months ago
- ☆175Updated this week
- Llemma formal2formal (tactic prediction) theorem proving experiments☆17Updated last year
- Simple and efficient pytorch-native transformer training and inference (batched)☆61Updated 7 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆80Updated last week
- ☆184Updated last month
- ☆26Updated 2 months ago
- A new way to generate large quantities of high quality synthetic data (on par with GPT-4), with better controllability, at a fraction of …☆21Updated last month
- PB-LLM: Partially Binarized Large Language Models☆148Updated last year
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆179Updated 5 months ago