CentML / DeepView.ProfileLinks
π Interactive performance profiling and debugging tool for PyTorch neural networks.
β64Updated last year
Alternatives and similar repositories for DeepView.Profile
Users that are interested in DeepView.Profile are comparing it to the libraries listed below
Sorting:
- Home for OctoML PyTorch Profilerβ114Updated 2 years ago
- Ship correct and fast LLM kernels to PyTorchβ135Updated last week
- β115Updated last year
- extensible collectives library in tritonβ93Updated 9 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β164Updated 2 weeks ago
- ML model training for edge devicesβ168Updated 2 years ago
- Parallel framework for training and fine-tuning deep neural networksβ70Updated 2 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ125Updated last year
- β71Updated 10 months ago
- ring-attention experimentsβ163Updated last year
- Hydragen: High-Throughput LLM Inference with Shared Prefixesβ46Updated last year
- A schedule language for large model trainingβ152Updated 5 months ago
- Write a fast kernel and run it on Discord. See how you compare against the best!β68Updated this week
- Fast low-bit matmul kernels in Tritonβ423Updated last month
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestrationβ257Updated last year
- Memory Optimizations for Deep Learning (ICML 2023)β114Updated last year
- Triton-based Symmetric Memory operators and examplesβ79Updated last week
- This repository contains the experimental PyTorch native float8 training UXβ227Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ234Updated last week
- A bunch of kernels that might make stuff slower πβ75Updated this week
- β343Updated 3 weeks ago
- TritonParse: A Compiler Tracer, Visualizer, and Reproducer for Triton Kernelsβ186Updated this week
- β252Updated last year
- β273Updated this week
- JaxPP is a library for JAX that enables flexible MPMD pipeline parallelism for large-scale LLM trainingβ63Updated this week
- Applied AI experiments and examples for PyTorchβ314Updated 5 months ago
- Collection of kernels written in Triton languageβ175Updated 9 months ago
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β218Updated this week
- Boosting 4-bit inference kernels with 2:4 Sparsityβ93Updated last year
- Evaluating Large Language Models for CUDA Code Generation ComputeEval is a framework designed to generate and evaluate CUDA code from Larβ¦β92Updated 2 weeks ago