NVIDIA / NeMo-Run
A tool to configure, launch and manage your machine learning experiments.
β139Updated this week
Alternatives and similar repositories for NeMo-Run:
Users that are interested in NeMo-Run are comparing it to the libraries listed below
- π Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flashβ¦β240Updated this week
- Megatron's multi-modal data loaderβ191Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ223Updated 8 months ago
- Applied AI experiments and examples for PyTorchβ261Updated last month
- Scalable and Performant Data Loadingβ237Updated this week
- PyTorch per step fault tolerance (actively under development)β284Updated this week
- Google TPU optimizations for transformers modelsβ108Updated 3 months ago
- Fast low-bit matmul kernels in Tritonβ291Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β192Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsβ262Updated 6 months ago
- Load compute kernels from the Hubβ115Updated last week
- β207Updated 3 months ago
- Efficient LLM Inference over Long Sequencesβ368Updated this week
- β186Updated 6 months ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"β59Updated 3 weeks ago
- β169Updated 2 months ago
- Implementation of π Ring Attention, from Liu et al. at Berkeley AI, in Pytorchβ511Updated 5 months ago
- β246Updated last week
- β200Updated this week
- Easy and Efficient Quantization for Transformersβ197Updated 2 months ago
- ring-attention experimentsβ129Updated 6 months ago
- Triton-based implementation of Sparse Mixture of Experts.β210Updated 4 months ago
- Easy and lightning fast training of π€ Transformers on Habana Gaudi processor (HPU)β185Updated this week
- β103Updated 7 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASSβ161Updated last month
- A safetensors extension to efficiently store sparse quantized tensors on diskβ100Updated this week
- Memory layers use a trainable key-value lookup mechanism to add extra parameters to a model without increasing FLOPs. Conceptually, sparsβ¦β317Updated 4 months ago
- A family of compressed models obtained via pruning and knowledge distillationβ334Updated 5 months ago
- Large Context Attentionβ704Updated 3 months ago
- β68Updated 3 weeks ago