Quansight / torch-buildLinks
Collection of scripts to build PyTorch and the domain libraries from source.
☆11Updated last week
Alternatives and similar repositories for torch-build
Users that are interested in torch-build are comparing it to the libraries listed below
Sorting:
- Einsum optimization using opt_einsum and PyTorch FX graph rewriting☆21Updated 3 years ago
- A tracing JIT compiler for PyTorch☆13Updated 3 years ago
- POC work on MLIR backend☆55Updated 10 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆61Updated 2 months ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated this week
- MLIR-based partitioning system☆97Updated this week
- NPBench - A Benchmarking Suite for High-Performance NumPy☆81Updated last month
- A lightweight, Pythonic, frontend for MLIR☆81Updated last year
- A tracing JIT for PyTorch☆17Updated 2 years ago
- ☆21Updated 3 months ago
- ☆52Updated 10 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆43Updated 3 months ago
- Python bindings for OpenSHMEM☆17Updated 2 months ago
- Worked example of the process from Python source to CUDA kernel execution with Numba☆41Updated 9 months ago
- pytorch ucc plugin☆22Updated 3 years ago
- ☆16Updated 9 months ago
- An IR for efficiently simulating distributed ML computation.☆28Updated last year
- ROCm SPARSE marshalling library☆67Updated this week
- ☆22Updated this week
- A CUTLASS implementation using SYCL☆27Updated this week
- Analyze graph/hierarchical performance data using pandas dataframes☆115Updated 4 months ago
- TORCH_LOGS parser for PT2☆43Updated 3 weeks ago
- An MLIR frontend for tensor expressions☆25Updated 4 years ago
- ☆13Updated 4 years ago
- ROCm Systems Profiler☆20Updated this week
- Python bindings for UCX☆135Updated last week
- Matrix multiplication on GPUs for matrices stored on a CPU. Similar to cublasXt, but ported to both NVIDIA and AMD GPUs.☆33Updated 2 months ago
- ☆28Updated 5 months ago
- Open source cross-platform compiler for compute-intensive loops used in AI algorithms, from Microsoft Research☆109Updated last year
- Benchmarks to capture important workloads.☆31Updated 4 months ago