CentML / DeepView.Profile
π Interactive performance profiling and debugging tool for PyTorch neural networks.
β61Updated 3 months ago
Alternatives and similar repositories for DeepView.Profile:
Users that are interested in DeepView.Profile are comparing it to the libraries listed below
- extensible collectives library in tritonβ86Updated last month
- Home for OctoML PyTorch Profilerβ113Updated 2 years ago
- β79Updated 6 months ago
- β104Updated 8 months ago
- A schedule language for large model trainingβ146Updated 10 months ago
- PyTorch centric eager mode debuggerβ47Updated 4 months ago
- Applied AI experiments and examples for PyTorchβ264Updated last week
- A Python library transfers PyTorch tensors between CPU and NVMeβ115Updated 5 months ago
- Framework to reduce autotune overhead to zero for well known deployments.β70Updated last week
- ring-attention experimentsβ140Updated 6 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β157Updated 5 months ago
- Fast low-bit matmul kernels in Tritonβ297Updated this week
- β27Updated 3 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β131Updated last year
- Boosting 4-bit inference kernels with 2:4 Sparsityβ73Updated 8 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.β122Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β194Updated this week
- High-Performance SGEMM on CUDA devicesβ90Updated 3 months ago
- β202Updated 2 weeks ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixesβ36Updated last year
- β68Updated last month
- Write a fast kernel and run it on Discord. See how you compare against the best!β44Updated this week
- Memory Optimizations for Deep Learning (ICML 2023)β64Updated last year
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestrationβ209Updated 5 months ago
- β26Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.β84Updated last week
- β13Updated 2 months ago
- MLIR-based partitioning systemβ82Updated this week
- A library to analyze PyTorch traces.β368Updated last week
- A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCLβ19Updated last week