NVIDIA / nsight-pythonLinks
Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Tools
☆30Updated last week
Alternatives and similar repositories for nsight-python
Users that are interested in nsight-python are comparing it to the libraries listed below
Sorting:
- extensible collectives library in triton☆91Updated 8 months ago
- Ship correct and fast LLM kernels to PyTorch☆124Updated 2 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!☆61Updated last week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆151Updated 2 years ago
- ☆337Updated last week
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆640Updated this week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆312Updated this week
- Experiment of using Tangent to autodiff triton☆80Updated last year
- Fault tolerance for PyTorch (HSDP, LocalSGD, DiLoCo, Streaming DiLoCo)☆454Updated 2 weeks ago
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆78Updated 2 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.☆170Updated this week
- ☆21Updated 8 months ago
- jax-triton contains integrations between JAX and OpenAI Triton☆436Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM training☆57Updated last week
- Fast low-bit matmul kernels in Triton☆401Updated last week
- This repository contains the experimental PyTorch native float8 training UX☆226Updated last year
- Automatic differentiation for Triton Kernels☆30Updated 3 months ago
- ☆184Updated last year
- ☆28Updated 10 months ago
- This repository hosts code that supports the testing infrastructure for the PyTorch organization. For example, this repo hosts the logic …☆103Updated this week
- ☆190Updated last week
- ☆256Updated this week
- Applied AI experiments and examples for PyTorch☆307Updated 3 months ago
- train with kittens!☆63Updated last year
- PyTorch RFCs (experimental)☆136Updated 6 months ago
- A bunch of kernels that might make stuff slower 😉☆65Updated last week
- ☆147Updated 3 weeks ago
- A library for unit scaling in PyTorch☆132Updated 4 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…☆161Updated 2 months ago
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆46Updated last year