NVIDIA / nvidia-dlfw-inspectLinks
The tool facilitates debugging convergence issues and testing new algorithms and recipes for training LLMs using Nvidia libraries such as Transformer Engine, Megatron-LM, and NeMo.
☆18Updated 4 months ago
Alternatives and similar repositories for nvidia-dlfw-inspect
Users that are interested in nvidia-dlfw-inspect are comparing it to the libraries listed below
Sorting:
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆391Updated this week
- LLM KV cache compression made easy☆876Updated 2 weeks ago
- Applied AI experiments and examples for PyTorch☆315Updated 5 months ago
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark + Toolkit with Torch -> CUDA (+ more DSLs)☆792Updated 3 weeks ago
- Accelerating MoE with IO and Tile-aware Optimizations☆569Updated 3 weeks ago
- 🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.☆219Updated last week
- 🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…☆280Updated 2 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆228Updated this week
- Perplexity GPU Kernels☆560Updated 3 months ago
- Training library for Megatron-based models with bidirectional Hugging Face conversion capability☆419Updated this week
- ☆232Updated 2 months ago
- Cataloging released Triton kernels.☆292Updated 5 months ago
- Load compute kernels from the Hub☆397Updated this week
- Fast low-bit matmul kernels in Triton☆427Updated last week
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆475Updated last week
- Pytorch Distributed native training library for LLMs/VLMs with OOTB Hugging Face support☆288Updated this week
- Code for data-aware compression of DeepSeek models☆70Updated 2 months ago
- A curated collection of resources, tutorials, and best practices for learning and mastering NVIDIA CUTLASS☆251Updated 9 months ago
- Miles is an enterprise-facing reinforcement learning framework for LLM and VLM post-training, forked from and co-evolving with slime.☆830Updated last week
- Scalable toolkit for efficient model reinforcement☆1,293Updated last week
- Efficient LLM Inference over Long Sequences☆394Updated 7 months ago
- Best practices for training DeepSeek, Mixtral, Qwen and other MoE models using Megatron Core.☆161Updated 2 weeks ago
- FlexAttention based, minimal vllm-style inference engine for fast Gemma 2 inference.☆334Updated 3 months ago
- JAX backend for SGL☆234Updated this week
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆273Updated last week
- kernels, of the mega variety☆665Updated last week
- ☆286Updated last week
- A Quirky Assortment of CuTe Kernels☆781Updated this week
- ☆236Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆228Updated this week