Antlera / nanoGPT-moeLinks
Enable moe for nanogpt.
☆30Updated last year
Alternatives and similar repositories for nanoGPT-moe
Users that are interested in nanoGPT-moe are comparing it to the libraries listed below
Sorting:
- Token Omission Via Attention☆126Updated 7 months ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆126Updated 6 months ago
- ☆45Updated last year
- This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.☆104Updated 11 months ago
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆172Updated 4 months ago
- Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"☆237Updated 4 months ago
- ☆61Updated last year
- A repository for research on medium sized language models.☆76Updated last year
- RWKV-7: Surpassing GPT☆88Updated 6 months ago
- Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of…☆134Updated 9 months ago
- Replicating O1 inference-time scaling laws☆87Updated 6 months ago
- ☆114Updated 3 months ago
- EvaByte: Efficient Byte-level Language Models at Scale☆101Updated last month
- Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…☆149Updated 2 months ago
- This repo is based on https://github.com/jiaweizzhao/GaLore☆28Updated 8 months ago
- ☆45Updated last year
- ☆51Updated 7 months ago
- This is the official repository for Inheritune.☆111Updated 3 months ago
- Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044☆33Updated 8 months ago
- ☆25Updated 4 months ago
- ☆125Updated last year
- The code repository for the CURLoRA research paper. Stable LLM continual fine-tuning and catastrophic forgetting mitigation.☆44Updated 9 months ago
- look how they massacred my boy☆63Updated 7 months ago
- Code to reproduce "Transformers Can Do Arithmetic with the Right Embeddings", McLeish et al (NeurIPS 2024)☆190Updated last year
- ☆197Updated 6 months ago
- Comprehensive toolkit for Reinforcement Learning from Human Feedback (RLHF) training, featuring instruction fine-tuning, reward model tra…☆157Updated last year
- A pipeline for LLM knowledge distillation☆104Updated 2 months ago
- ☆174Updated last month
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆60Updated 2 months ago
- Evaluating LLMs with fewer examples☆156Updated last year