CentML / DeepView.ProfileLinks
π Interactive performance profiling and debugging tool for PyTorch neural networks.
β64Updated last year
Alternatives and similar repositories for DeepView.Profile
Users that are interested in DeepView.Profile are comparing it to the libraries listed below
Sorting:
- extensible collectives library in tritonβ95Updated 10 months ago
- Home for OctoML PyTorch Profilerβ113Updated 2 years ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β164Updated 3 weeks ago
- Ship correct and fast LLM kernels to PyTorchβ140Updated 3 weeks ago
- ML model training for edge devicesβ168Updated 2 years ago
- β115Updated last year
- β71Updated 10 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ125Updated last year
- ring-attention experimentsβ165Updated last year
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernelsβ194Updated this week
- Parallel framework for training and fine-tuning deep neural networksβ70Updated 2 months ago
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ63Updated 2 weeks ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β68Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- A schedule language for large model trainingβ152Updated 5 months ago
- Framework to reduce autotune overhead to zero for well known deployments.β95Updated 4 months ago
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Larβ¦β96Updated 3 weeks ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β219Updated this week
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestrationβ260Updated last year
- Triton-based Symmetric Memory operators and examplesβ81Updated 3 weeks ago
- β286Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)β49Updated 5 months ago
- A library to analyze PyTorch traces.β462Updated this week
- β252Updated last year
- β27Updated 2 years ago
- Fast low-bit matmul kernels in Tritonβ427Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.β127Updated last year
- β345Updated last week
- A safetensors extension to efficiently store sparse quantized tensors on diskβ238Updated this week
- Collection of kernels written in Triton languageβ178Updated last week