cmpute / pytorch-cmake-exampleLinks
Example to build PyTorch CUDA extension using CMake (with pybind11 and scikit-build)
☆11Updated 5 years ago
Alternatives and similar repositories for pytorch-cmake-example
Users that are interested in pytorch-cmake-example are comparing it to the libraries listed below
Sorting:
- ☆32Updated 4 years ago
- CUDA Matrix Multiplication Optimization☆202Updated 11 months ago
- A library of GPU kernels for sparse matrix operations.☆270Updated 4 years ago
- Step-by-step optimization of CUDA SGEMM☆355Updated 3 years ago
- The NVIDIA® Tools Extension SDK (NVTX) is a C-based Application Programming Interface (API) for annotating events, code ranges, and resou…☆415Updated this week
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆343Updated this week
- Kernel Tuner☆353Updated this week
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆816Updated 10 months ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆71Updated 4 years ago
- ☆554Updated last week
- NVIDIA Math Libraries for the Python Ecosystem☆333Updated last week
- ☆169Updated last year
- Training material for Nsight developer tools☆161Updated 11 months ago
- Fast CUDA matrix multiplication from scratch☆764Updated last year
- CUDA Kernel Benchmarking Library☆682Updated last week
- Official Problem Sets / Reference Kernels for the GPU MODE Leaderboard!☆67Updated this week
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆138Updated 4 years ago
- An extension library of WMMA API (Tensor Core API)☆99Updated last year
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆91Updated last year
- ☆216Updated last year
- ☆225Updated this week
- A simple high performance CUDA GEMM implementation.☆386Updated last year
- Template for GPU accelerated python libraries☆49Updated last year
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆236Updated 10 months ago
- depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.☆701Updated 2 months ago
- ☆168Updated 11 months ago
- ☆59Updated 10 months ago
- A Python-embedded DSL that makes it easy to write fast, scalable ML kernels with minimal boilerplate.☆187Updated this week
- Template for starting CUDA/C++ project using CMake with Github Action for CI☆31Updated 3 weeks ago
- CUTLASS and CuTe Examples☆63Updated this week