NVIDIA / NeMo-Run
A tool to configure, launch and manage your machine learning experiments.
β133Updated this week
Alternatives and similar repositories for NeMo-Run:
Users that are interested in NeMo-Run are comparing it to the libraries listed below
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β234Updated this week
- Google TPU optimizations for transformers modelsβ104Updated 2 months ago
- PyTorch per step fault tolerance (actively under development)β271Updated last week
- Megatron's multi-modal data loaderβ183Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ222Updated 8 months ago
- Load compute kernels from the Hubβ107Updated last week
- Scalable and Performant Data Loadingβ231Updated this week
- Applied AI experiments and examples for PyTorchβ251Updated last week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β55Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β190Updated this week
- OpenAI compatible API for TensorRT LLM triton backendβ202Updated 8 months ago
- β204Updated 2 months ago
- Fast low-bit matmul kernels in Tritonβ275Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 5 months ago
- Manage scalable open LLM inference endpoints in Slurm clustersβ253Updated 8 months ago
- β176Updated this week
- NVIDIA Resiliency Extension is a python package for framework developers and users to implement fault-tolerant features. It improves the β¦β131Updated last week
- Triton-based implementation of Sparse Mixture of Experts.β209Updated 4 months ago
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ151Updated this week
- some common Huggingface transformers in maximal update parametrization (Β΅P)β80Updated 3 years ago
- A project to improve skills of large language modelsβ260Updated this week
- PyTorch building blocks for the OLMo ecosystemβ177Updated this week
- β102Updated 7 months ago
- ring-attention experimentsβ128Updated 5 months ago
- β238Updated this week
- Simple implementation of Speculative Sampling in NumPy for GPT-2.β92Updated last year
- β184Updated 6 months ago
- A family of compressed models obtained via pruning and knowledge distillationβ331Updated 4 months ago
- β158Updated last month
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β181Updated this week