NVIDIA-NeMo / AutomodelLinks
Fine-tune any Hugging Face LLM or VLM on day-0 using PyTorch-native features for GPU-accelerated distributed training with superior performance and memory efficiency.
β40Updated last week
Alternatives and similar repositories for Automodel
Users that are interested in Automodel are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β260Updated last month
- Load compute kernels from the Hubβ244Updated this week
- Scalable toolkit for efficient model reinforcementβ626Updated last week
- A tool to configure, launch and manage your machine learning experiments.β182Updated this week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β383Updated last week
- PyTorch Single Controllerβ361Updated last week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β208Updated last week
- Triton-based implementation of Sparse Mixture of Experts.β233Updated 8 months ago
- β514Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- ring-attention experimentsβ149Updated 10 months ago
- LLM KV cache compression made easyβ586Updated this week
- β211Updated 6 months ago
- β123Updated 2 months ago
- β162Updated last year
- Applied AI experiments and examples for PyTorchβ290Updated 2 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ527Updated this week
- A JAX-native LLM Post-Training Libraryβ116Updated last week
- β232Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β250Updated 2 weeks ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β67Updated 4 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β200Updated last week
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ536Updated 3 months ago
- Megatron's multi-modal data loaderβ237Updated last week
- β118Updated last year
- β88Updated last year
- Official repository for DistFlashAttn: Distributed Memory-efficient Attention for Long-context LLMs Trainingβ214Updated last year
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.β56Updated this week
- Google TPU optimizations for transformers modelsβ118Updated 7 months ago
- β324Updated 3 weeks ago