CentML / DeepView.ProfileLinks

🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.

☆64

Alternatives and similar repositories for DeepView.Profile

Users that are interested in DeepView.Profile are comparing it to the libraries listed below

Sorting:

octoml / octoml-profile
Home for OctoML PyTorch Profiler
☆113Updated 2 years ago
cchan / tccl
extensible collectives library in triton
☆88Updated 4 months ago
pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆158Updated last month
axonn-ai / axonn
A parallel framework for training deep neural networks
☆63Updated 4 months ago
stanford-futuredata / stk
☆107Updated 11 months ago
awslabs / slapo
A schedule language for large model training
☆149Updated last year
neuralmagic / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆142Updated this week
ShishirPatil / poet
ML model training for edge devices
☆165Updated last year
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆206Updated last week
mobiusml / gemlite
Fast low-bit matmul kernels in Triton
☆339Updated this week
deepspeedai / DeepSpeed-Kernels
☆74Updated 4 months ago
gpu-mode / discord-cluster-manager
Write a fast kernel and run it on Discord. See how you compare against the best!
☆48Updated this week
efeslab / fiddler
[ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration
☆225Updated 8 months ago
Jokeren / triton-samples
☆28Updated 6 months ago
IST-DASLab / Sparse-Marlin
Boosting 4-bit inference kernels with 2:4 Sparsity
☆80Updated 11 months ago
apple / ml-recurrent-drafter
☆215Updated 6 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆277Updated last year
google / aqt
☆323Updated this week
pytorch-labs / applied-ai
Applied AI experiments and examples for PyTorch
☆289Updated 2 months ago
pytorch-labs / triton-cpu
An experimental CPU backend for Triton (https//github.com/openai/triton)
☆43Updated 4 months ago
pytorch-labs / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆224Updated last year
pytorch-labs / tritonparse
TritonParse: A Compiler Tracer, Visualizer, and mini-Reproducer(WIP) for Triton Kernels
☆138Updated this week
microsoft / varuna
☆251Updated last year
facebookresearch / HolisticTraceAnalysis
A library to analyze PyTorch traces.
☆400Updated last week
yandex-research / swarm
Official code for "SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient"
☆141Updated last year
zinccat / Awesome-Triton-Kernels
Collection of kernels written in Triton language
☆142Updated 4 months ago
ScalingIntelligence / hydragen
Hydragen: High-Throughput LLM Inference with Shared Prefixes
☆41Updated last year
ezyang / torchdbg
PyTorch centric eager mode debugger
☆47Updated 7 months ago
pytorch / helion
A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.
☆212Updated this week
mlc-ai / llm-perf-bench
☆120Updated last year