NVIDIA-NeMo / AutomodelLinks

Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support

☆194

Alternatives and similar repositories for Automodel

Users that are interested in Automodel are comparing it to the libraries listed below

Sorting:

NVIDIA-NeMo / Megatron-Bridge
HuggingFace conversion and training library for Megatron-based models
☆228Updated this week
radixark / miles
☆344Updated this week
huggingface / kernels
Load compute kernels from the Hub
☆337Updated last week
Dao-AILab / grouped-latent-attention
☆132Updated 6 months ago
Infini-AI-Lab / MagicDec
[ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding
☆132Updated last year
snowflakedb / ArcticTraining
ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)
☆257Updated this week
nil0x9 / flash-muon
Flash-Muon: An Efficient Implementation of Muon Optimizer
☆212Updated 5 months ago
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week
tilde-research / MoMoE-impl
Memory optimized Mixture of Experts
☆69Updated 4 months ago
NVIDIA / Star-Attention
Efficient LLM Inference over Long Sequences
☆392Updated 5 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated this week
IsaacRe / vllm-kvcompress
KV cache compression for high-throughput LLM inference
☆145Updated 10 months ago
ServiceNow / PipelineRL
A scalable asynchronous reinforcement learning implementation with in-flight weight updates.
☆316Updated last week
NVIDIA / Megatron-Energon
Megatron's multi-modal data loader
☆280Updated this week
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆170Updated last week
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated last week
NVlabs / QeRL
QeRL enables RL for 32B LLMs on a single H100 GPU.
☆459Updated last week
PiotrNawrot / sparse-frontier
The evaluation framework for training-free sparse attention in LLMs
☆106Updated last month
siyan-zhao / prepacking
The source code of our work "Prepacking: A Simple Method for Fast Prefilling and Increased Throughput in Large Language Models" [AISTATS …
☆60Updated last year
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆208Updated last week
gpu-mode / ring-attention
ring-attention experiments
☆160Updated last year
amazon-science / mxfp4-llm
Official implementation for Training LLMs with MXFP4
☆110Updated 7 months ago
apple / ml-recurrent-drafter
☆219Updated 10 months ago
hao-ai-lab / Dynasor
[NeurIPS 2025] Simple extension on vLLM to help you speed up reasoning model without training.
☆209Updated 6 months ago
meta-pytorch / torchforge
PyTorch-native post-training at scale
☆549Updated last week
shawntan / scattermoe
Triton-based implementation of Sparse Mixture of Experts.
☆253Updated 2 months ago
changjonathanc / flex-nano-vllm
FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.
☆313Updated last month
RulinShao / LightSeq
Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Training
☆218Updated last year
lmgame-org / GRL
Multi-Turn RL Training System with AgentTrainer for Language Model Game Reinforcement Learning
☆53Updated 3 weeks ago
huggingface / picotron_tutorial
☆224Updated last week