Quansight / torch-buildLinks
Collection of scripts to build PyTorch and the domain libraries from source.
☆12Updated last month
Alternatives and similar repositories for torch-build
Users that are interested in torch-build are comparing it to the libraries listed below
Sorting:
- NPBench - A Benchmarking Suite for High-Performance NumPy☆85Updated 2 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated this week
- MLIR-based partitioning system☆103Updated this week
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 4 months ago
- TORCH_LOGS parser for PT2☆46Updated this week
- The CUDA target for Numba☆149Updated last week
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆62Updated 3 months ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- A lightweight, Pythonic, frontend for MLIR☆80Updated last year
- A CUTLASS implementation using SYCL☆30Updated last week
- ☆48Updated this week
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- Bandwidth test for ROCm☆60Updated this week
- NVIDIA's launch, startup, and logging scripts used by our MLPerf Training and HPC submissions☆27Updated last week
- Cosmic Tagging Network for Neutrino Physics☆13Updated last year
- Analyze graph/hierarchical performance data using pandas dataframes☆116Updated 5 months ago
- ☆16Updated 2 years ago
- LLM training in simple, raw C/CUDA☆99Updated last year
- Material for the SC22 Deep Learning at Scale Tutorial☆41Updated 2 years ago
- ☆21Updated 4 months ago
- Reference implementations of MLPerf™ HPC training benchmarks☆48Updated 4 months ago
- Graph-indexed Pandas DataFrames for analyzing hierarchical performance data☆34Updated last week
- ☆23Updated last week
- ☆31Updated this week
- MatMul Performance Benchmarks for a Single CPU Core comparing both hand engineered and codegen kernels.☆133Updated last year
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆33Updated 3 months ago
- A tracing infrastructure for heterogeneous computing applications.☆33Updated this week
- COCCL: Compression and precision co-aware collective communication library☆24Updated 4 months ago
- pytorch ucc plugin☆22Updated 4 years ago
- Advanced Profiling and Analytics for AMD Hardware☆159Updated this week