CentML / DeepView.Profile
π Interactive performance profiling and debugging tool for PyTorch neural networks.
β55Updated this week
Related projects β
Alternatives and complementary repositories for DeepView.Profile
- Applied AI experiments and examples for PyTorchβ168Updated 3 weeks ago
- β90Updated 2 months ago
- extensible collectives library in tritonβ72Updated 2 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β146Updated this week
- A schedule language for large model trainingβ141Updated 5 months ago
- This repository contains the experimental PyTorch native float8 training UXβ212Updated 3 months ago
- β49Updated 2 weeks ago
- Home for OctoML PyTorch Profilerβ107Updated last year
- Collection of kernels written in Triton languageβ69Updated 3 weeks ago
- β55Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)β60Updated 8 months ago
- Cataloging released Triton kernels.β138Updated 2 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ98Updated this week
- β153Updated this week
- π Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.β166Updated this week
- Fast Matrix Multiplications for Lookup Table-Quantized LLMsβ187Updated this week
- Simple and fast low-bit matmul kernels in CUDA / Tritonβ147Updated this week
- β169Updated 4 months ago
- An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).β211Updated 3 weeks ago
- A library to analyze PyTorch traces.β306Updated this week
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.β90Updated 4 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskβ51Updated this week
- β149Updated 5 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.β107Updated last year
- GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLMβ149Updated 4 months ago
- ring-attention experimentsβ97Updated last month
- Efficient, Flexible and Portable Structured Generationβ125Updated this week
- ML model training for edge devicesβ157Updated last year
- Fast Hadamard transform in CUDA, with a PyTorch interfaceβ111Updated 6 months ago
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.β34Updated 2 years ago