CentML / DeepView.Profile
π Interactive performance profiling and debugging tool for PyTorch neural networks.
β58Updated last month
Alternatives and similar repositories for DeepView.Profile:
Users that are interested in DeepView.Profile are comparing it to the libraries listed below
- extensible collectives library in tritonβ83Updated 4 months ago
- β67Updated 3 months ago
- A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mindβ¦β154Updated 2 months ago
- A Python library transfers PyTorch tensors between CPU and NVMeβ104Updated 2 months ago
- Cataloging released Triton kernels.β168Updated last month
- PyTorch centric eager mode debuggerβ46Updated 2 months ago
- β100Updated 5 months ago
- Home for OctoML PyTorch Profilerβ107Updated last year
- A schedule language for large model trainingβ144Updated 8 months ago
- Applied AI experiments and examples for PyTorchβ225Updated this week
- Write a fast kernel and run it on Discord. See how you compare against the best!β18Updated this week
- β59Updated 2 weeks ago
- β180Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problemsβ183Updated this week
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.β43Updated 7 months ago
- Collection of kernels written in Triton languageβ105Updated this week
- This repository contains the experimental PyTorch native float8 training UXβ221Updated 6 months ago
- A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCLβ19Updated this week
- Hydragen: High-Throughput LLM Inference with Shared Prefixesβ34Updated 9 months ago
- Torch Distributed Experimentalβ115Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)β62Updated 11 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsityβ64Updated 5 months ago
- Fast low-bit matmul kernels in Tritonβ238Updated this week
- ring-attention experimentsβ123Updated 4 months ago
- β25Updated last month
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuningβ35Updated this week
- End to End steps for adding custom ops in PyTorch.β20Updated 4 years ago
- Repository for CPU Kernel Generation for LLM Inferenceβ25Updated last year
- A safetensors extension to efficiently store sparse quantized tensors on diskβ77Updated this week
- Fairring (FAIR + Herring) is a plug-in for PyTorch that provides a process group for distributed training that outperforms NCCL at large β¦β64Updated 2 years ago