CentML / DeepView.ProfileLinks
π Interactive performance profiling and debugging tool for PyTorch neural networks.
β64Updated 10 months ago
Alternatives and similar repositories for DeepView.Profile
Users that are interested in DeepView.Profile are comparing it to the libraries listed below
Sorting:
- Home for OctoML PyTorch Profilerβ114Updated 2 years ago
- Ship correct and fast LLM kernels to PyTorchβ125Updated 3 weeks ago
- extensible collectives library in tritonβ91Updated 8 months ago
- β113Updated last year
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β161Updated 2 months ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernelsβ177Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)β112Updated last year
- β71Updated 8 months ago
- ring-attention experimentsβ160Updated last year
- Write a fast kernel and run it on Discord. See how you compare against the best!β64Updated last week
- Fast low-bit matmul kernels in Tritonβ402Updated 2 weeks ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Larβ¦β77Updated 2 weeks ago
- This repository contains the experimental PyTorch native float8 training UXβ226Updated last year
- β96Updated last year
- Nsight Python is a Python kernel profiling interface based on NVIDIA Nsight Toolsβ67Updated this week
- A schedule language for large model trainingβ151Updated 3 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ122Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestrationβ243Updated last year
- Applied AI experiments and examples for PyTorchβ309Updated 3 months ago
- β120Updated last year
- Collection of kernels written in Triton languageβ173Updated 8 months ago
- Framework to reduce autotune overhead to zero for well known deployments.β90Updated 2 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)β47Updated 3 months ago
- β28Updated 10 months ago
- ML model training for edge devicesβ167Updated 2 years ago
- β257Updated this week
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β152Updated 2 years ago
- Triton-based Symmetric Memory operators and examplesβ65Updated last month
- A library to analyze PyTorch traces.β443Updated 3 weeks ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ58Updated 3 weeks ago