CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆58Updated 2 months ago
Alternatives and similar repositories for DeepView.Profile:
Users that are interested in DeepView.Profile are comparing it to the libraries listed below
- ☆101Updated 6 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- PyTorch centric eager mode debugger☆46Updated 3 months ago
- This repository contains the experimental PyTorch native float8 training UX☆222Updated 7 months ago
- ☆62Updated 3 weeks ago
- A schedule language for large model training☆145Updated 9 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated last year
- Fast Matrix Multiplications for Lookup Table-Quantized LLMs☆234Updated last month
- ring-attention experiments☆127Updated 5 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆71Updated 6 months ago
- Home for OctoML PyTorch Profiler☆108Updated last year
- Demo of the unit_scaling library, showing how a model can be easily adapted to train in FP8.☆45Updated 8 months ago
- Tritonbench is a collection of PyTorch custom operators with example inputs to measure their performance.☆104Updated this week
- Fast low-bit matmul kernels in Triton☆267Updated this week
- Applied AI experiments and examples for PyTorch☆249Updated this week
- Make triton easier☆47Updated 9 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆91Updated this week
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆40Updated last year
- ☆191Updated this week
- Cataloging released Triton kernels.☆204Updated 2 months ago
- A minimal implementation of vllm.☆36Updated 7 months ago
- A framework for PyTorch to enable fault management for collective communication libraries (CCL) such as NCCL☆19Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆63Updated last week
- Compression for Foundation Models☆27Updated last month
- ☆73Updated 4 months ago
- Repository for CPU Kernel Generation for LLM Inference☆25Updated last year
- LLM Serving Performance Evaluation Harness☆70Updated 3 weeks ago
- [ICLR'25] Fast Inference of MoE Models with CPU-GPU Orchestration☆201Updated 4 months ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆101Updated 8 months ago
- ☆27Updated 2 months ago