Antlera / nanoGPT-moeLinks

Enable moe for nanogpt.

☆34

Alternatives and similar repositories for nanoGPT-moe

Users that are interested in nanoGPT-moe are comparing it to the libraries listed below

Sorting:

Zyphra / tree_attention
Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
☆130Updated 9 months ago
schwartz-lab-NLP / TOVA
Token Omission Via Attention
☆128Updated 11 months ago
whyNLP / LCKV
Layer-Condensed KV cache w/ 10 times larger batch size, fewer params and less computation. Dramatic speed up with better task performance…
☆155Updated 5 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆230Updated 2 months ago
FasterDecoding / BitDelta
☆202Updated 9 months ago
imoneoi / multipack
Multipack distributed sampler for fast padding-free training of LLMs
☆201Updated last year
OpenEvaByte / evabyte
EvaByte: Efficient Byte-level Language Models at Scale
☆109Updated 5 months ago
casper-hansen / OpenCoconut
OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.
☆172Updated 8 months ago
dust-tt / llama-ssp
Experiments on speculative sampling with Llama models
☆128Updated 2 years ago
JacobPfau / fillerTokens
☆72Updated last year
wdlctc / mini-s
☆53Updated 10 months ago
HanGuo97 / lq-lora
☆128Updated last year
sanyalsunny111 / LLM-Inheritune
This is the official repository for Inheritune.
☆113Updated 7 months ago
golololologol / LLM-Distillery
A pipeline for LLM knowledge distillation
☆109Updated 5 months ago
Digitous / LLM-SLERP-Merge
Spherical Merge Pytorch/HF format Language Models with minimal feature loss.
☆138Updated 2 years ago
uukuguy / multi_loras
Load multiple LoRA modules simultaneously and automatically switch the appropriate combination of LoRA modules to generate the best answe…
☆158Updated last year
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆246Updated 7 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆226Updated this week
HazyResearch / based
Code for exploring Based models from "Simple linear attention language models balance the recall-throughput tradeoff"
☆241Updated 3 months ago
SalesforceAIResearch / LaTRO
☆122Updated 7 months ago
BorealisAI / flora-opt
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
☆104Updated last year
BlinkDL / modded-nanogpt-rwkv
RWKV-7: Surpassing GPT
☆95Updated 10 months ago
yandex-research / swarm
Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
☆144Updated last year
llm-random / llm-random
☆196Updated 3 weeks ago
architsharma97 / dpo-rlaif
☆100Updated last year
hao-ai-lab / Consistency_LLM
[ICML 2024] CLLMs: Consistency Large Language Models
☆404Updated 10 months ago
akjindal53244 / Arithmo
Small and Efficient Mathematical Reasoning LLMs
☆72Updated last year
hughbzhang / o1_inference_scaling_laws
Replicating O1 inference-time scaling laws
☆90Updated 9 months ago
jeffreysijuntan / lloco
The official repo for "LLoCo: Learning Long Contexts Offline"
☆116Updated last year
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆164Updated 3 months ago