wolfecameron / nanoMoELinks

An extension of the nanoGPT repository for training small MOE models.

☆215

Alternatives and similar repositories for nanoMoE

Users that are interested in nanoMoE are comparing it to the libraries listed below

Sorting:

huggingface / picotron_tutorial
☆224Updated last week
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month
fangyuan-ksgk / Tiny-GRPO
minimal GRPO implementation from scratch
☆100Updated 8 months ago
allenai / OLMo-core
PyTorch building blocks for the OLMo ecosystem
☆482Updated this week
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
NVIDIA-NeMo / Skills
A project to improve skills of large language models
☆628Updated this week
HazyResearch / lolcats
Repo for "LoLCATs: On Low-Rank Linearizing of Large Language Models"
☆249Updated 10 months ago
facebookresearch / memory
Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, spars…
☆360Updated 11 months ago
facebookresearch / LayerSkip
Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024
☆347Updated 7 months ago
EleutherAI / nanoGPT-mup
The simplest, fastest repository for training/finetuning medium-sized GPTs.
☆174Updated 5 months ago
shangshang-wang / Tina
Tina: Tiny Reasoning Models via LoRA
☆309Updated 2 months ago
NVlabs / hymba
☆202Updated 11 months ago
NVIDIA / ngpt
Normalized Transformer (nGPT)
☆194Updated last year
MekkCyber / TritonAcademy
A repository to unravel the language of GPUs, making their kernel conversations easy to understand
☆196Updated 6 months ago
rasbt / dora-from-scratch
LoRA and DoRA from Scratch Implementations
☆215Updated last year
microsoft / ArchScale
Simple & Scalable Pretraining for Neural Architecture Research
☆302Updated last month
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆316Updated this week
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week
keeeeenw / MicroLlama
Micro Llama is a small Llama based model with 300M parameters trained from scratch with $500 budget
☆162Updated 3 months ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆392Updated 5 months ago
FasterDecoding / BitDelta
☆203Updated 11 months ago
joey00072 / nanoGRPO
nanoGRPO is a lightweight implementation of Group Relative Policy Optimization (GRPO)
☆126Updated 6 months ago
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆254Updated last week
llm-random / llm-random
☆205Updated last week
brendanhogan / DeepSeekRL-Extended
Exploring Applications of GRPO
☆249Updated 3 months ago
mengxiayu / LLMSuperWeight
Code for studying the super weight in LLM
☆121Updated last year
CASE-Lab-UMD / LLM-Drop
The official implementation of the paper "What Matters in Transformers? Not All Attention is Needed".
☆181Updated 2 weeks ago
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 4 months ago
Pints-AI / 1.5-Pints
A compact LLM pretrained in 9 days by using high quality data
☆334Updated 7 months ago
HazyResearch / zoology
Understand and test language model architectures on synthetic tasks.
☆240Updated 2 months ago