NVIDIA-NeMo / RunLinks
A tool to configure, launch and manage your machine learning experiments.
β214Updated this week
Alternatives and similar repositories for Run
Users that are interested in Run are comparing it to the libraries listed below
Sorting:
- Scalable and Performant Data Loadingβ362Updated this week
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β278Updated last month
- Load compute kernels from the Hubβ359Updated this week
- Google TPU optimizations for transformers modelsβ132Updated 3 weeks ago
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β467Updated 2 weeks ago
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face supportβ245Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ267Updated last month
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β398Updated last week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β79Updated 3 weeks ago
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β269Updated this week
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β204Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β218Updated this week
- β219Updated 11 months ago
- π· Build compute kernelsβ201Updated this week
- Megatron's multi-modal data loaderβ304Updated last week
- Simple and efficient DeepSeek V3 SFT using pipeline parallel and expert parallel, with both FP8 and BF16 trainingsβ112Updated 5 months ago
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β326Updated 3 months ago
- Simple & Scalable Pretraining for Neural Architecture Researchβ306Updated last month
- Efficient LLM Inference over Long Sequencesβ393Updated 6 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLMβ190Updated this week
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ277Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on diskβ233Updated this week
- Where GPUs get cooked π©βπ³π₯β347Updated 3 months ago
- β224Updated last month
- TPU inference for vLLM, with unified JAX and PyTorch support.β213Updated this week
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.β328Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β352Updated 8 months ago
- PyTorch-native post-training at scaleβ585Updated this week
- A family of compressed models obtained via pruning and knowledge distillationβ362Updated 2 months ago