ollewelin / Installing-and-Test-PyTorch-C-API-on-Ubuntu-with-GPU-enabled
Installing and Test PyTorch C++ API on Ubuntu with GPU enabled
☆24Updated last year
Alternatives and similar repositories for Installing-and-Test-PyTorch-C-API-on-Ubuntu-with-GPU-enabled:
Users that are interested in Installing-and-Test-PyTorch-C-API-on-Ubuntu-with-GPU-enabled are comparing it to the libraries listed below
- Some CUDA design patterns and a bit of template magic for CUDA☆150Updated last year
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆89Updated last year
- Serial and parallel implementations of matrix multiplication☆40Updated 4 years ago
- Programming accelerated applications with CUDA C/C++, enough to be able to begin work accelerating your own CPU-only applications for per…☆92Updated 6 years ago
- Quick and Self-Contained TensorRT Custom Plugin Implementation and Integration☆54Updated 9 months ago
- An expression template based linear algebra library running completely on the GPU using CUDA☆25Updated 3 years ago
- Learning and practice of high performance computing (CUDA, Vulkan, OpenCL, OpenMP, TBB, SSE/AVX, NEON, MPI, coroutines, etc. )☆60Updated last week
- Source code examples from the Parallel Forall Blog☆96Updated 6 years ago
- High-Performance Computing: CPU Instructions, GPU OpenCL & CUDA, etc.☆14Updated 10 months ago
- Implement Neural Networks in Cuda from Scratch☆22Updated 10 months ago
- How to use CUDA with Python numpy☆38Updated 7 years ago
- A small example of using new PyTorch C++ frontend to implement ResNet☆43Updated 6 years ago
- Learn OpenMP examples step by step☆91Updated 2 months ago
- C++20 N-dimensional Matrix class for hobby project☆23Updated 3 years ago
- Tutorial for wrapping C++ library into Python using pybind11 and CMake☆143Updated last year
- ResNet Implementation, Training, and Inference Using LibTorch C++ API☆39Updated 9 months ago
- ☆22Updated 10 months ago
- THIS REPOSITORY HAS MOVED TO github.com/nvidia/cub, WHICH IS AUTOMATICALLY MIRRORED HERE.☆84Updated last year
- Image Filtering using CUDA☆25Updated 6 years ago
- CUDA Matrix Multiplication Optimization☆177Updated 8 months ago
- Implementing CNN for Digit Recognition (MNIST and SVHN dataset) using PyTorch C++ API☆24Updated 2 years ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆30Updated last week
- Swin Transformer C++ Implementation☆62Updated 3 years ago
- Learning CUDA 10 Programming, published by Packt☆41Updated 2 years ago
- Examples from Programming in Parallel with CUDA☆131Updated 2 years ago
- ☆16Updated last year
- A detailed conversion of a C++ project to Python using pybind11☆18Updated 3 years ago
- cuDNN sample codes provided by Nvidia☆45Updated 6 years ago
- CUDA implementation of the fundamental sum reduce operation. Aims to be as optimized as reasonable.☆36Updated 7 years ago
- CUDA 8-bit Tensor Core Matrix Multiplication based on m16n16k16 WMMA API☆28Updated last year