NVIDIA-NeMo / RunLinks
A tool to configure, launch and manage your machine learning experiments.
β190Updated this week
Alternatives and similar repositories for Run
Users that are interested in Run are comparing it to the libraries listed below
Sorting:
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β265Updated last month
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)β395Updated 2 weeks ago
- Load compute kernels from the Hubβ271Updated this week
- Scalable and Performant Data Loadingβ299Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ266Updated 11 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β210Updated this week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β71Updated 5 months ago
- DTensor-native pretraining and fine-tuning for LLMs/VLMs with day-0 Hugging Face support, GPU-accelerated, and memory efficient.β71Updated this week
- Google TPU optimizations for transformers modelsβ120Updated 7 months ago
- β217Updated 7 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β194Updated this week
- ποΈ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oβ¦β315Updated this week
- β294Updated last month
- Efficient LLM Inference over Long Sequencesβ391Updated 2 months ago
- Code for "LayerSkip: Enabling Early Exit Inference and Self-Speculative Decoding", ACL 2024β334Updated 4 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β209Updated last week
- β216Updated 7 months ago
- Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.β509Updated 4 months ago
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ225Updated this week
- Megatron's multi-modal data loaderβ243Updated last week
- This repository contains the experimental PyTorch native float8 training UXβ224Updated last year
- LLM KV cache compression made easyβ604Updated this week
- ArcticInference: vLLM plugin for high-throughput, low-latency inferenceβ235Updated this week
- Large Context Attentionβ736Updated 7 months ago
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welβ¦β374Updated 3 months ago
- Scalable toolkit for efficient model reinforcementβ857Updated this week
- β118Updated last year
- Easy and Efficient Quantization for Transformersβ203Updated 2 months ago
- Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients.β200Updated last year
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ537Updated 3 months ago