oKatanaaa / CudaCythonSamplesLinks
This repository contains examples CUDA usage in Cython code.
☆24Updated 3 years ago
Alternatives and similar repositories for CudaCythonSamples
Users that are interested in CudaCythonSamples are comparing it to the libraries listed below
Sorting:
- NVIDIA Math Libraries for the Python Ecosystem☆318Updated 2 months ago
- An Online Deep Learning Interface for HPC programs on NVIDIA GPUs☆168Updated 3 weeks ago
- Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.☆47Updated last week
- Zero-copy MPI communication of JAX arrays, for turbo-charged HPC applications in Python☆476Updated 2 months ago
- The CUDA target for Numba☆128Updated this week
- GPU accelerated multigrid library for Python☆59Updated 8 months ago
- Exploring using stdpar and Cython☆33Updated 4 years ago
- Data Parallel Extension for Numba☆81Updated 6 months ago
- Fusing Taichi into PyTorch☆145Updated 2 years ago
- Template for GPU accelerated python libraries☆48Updated last year
- OpenMP for Python in Numba☆106Updated last month
- Extending JAX with custom C++ and CUDA code☆394Updated 9 months ago
- An example combining scikit-build and pybind11☆129Updated this week
- Orthogonal polynomials in all shapes and sizes.☆184Updated last year
- S2FFT: Differentiable and accelerated spherical transforms☆192Updated this week
- Legate Sparse is a Legate library that aims to provide a distributed and accelerated drop-in replacement for the scipy.sparse library on …☆21Updated this week
- A JAX-based research framework for differentiable and parallelizable acoustic simulations, on CPU, GPUs and TPUs☆168Updated 8 months ago
- Data Parallel Extension for NumPy☆108Updated this week
- Profiling with NVIDIA Nsight Tools Bootcamp☆12Updated last year
- Python Module for PyTorch Tensor Visualisation in CUDA Eliminating CPU Transfer☆38Updated 4 months ago
- Numerical integration in arbitrary dimensions on the GPU using PyTorch / TF / JAX☆203Updated 6 months ago
- ☆57Updated 3 weeks ago
- Example to build PyTorch CUDA extension using CMake (with pybind11 and scikit-build)☆11Updated 5 years ago
- Nonuniform fast Fourier transforms of types 1 and 2, in 1D, 2D, and 3D, on the GPU☆87Updated last year
- Implemented the forward mode of automatic differentiation with the help of dual numbers using Python.☆20Updated 2 years ago
- An easily integrable Cholesky solver on CPU and GPU☆235Updated 6 months ago
- Wraps PyTorch code in a JIT-compatible way for JAX. Supports automatically defining gradients for reverse-mode AutoDiff.☆53Updated last month
- A sparse KLU solver for PyTorch.☆65Updated 3 years ago
- GPU/TPU accelerated nonlinear least-squares curve fitting using JAX☆57Updated last year
- PyTorch implementation of Levenberg-Marquardt training algorithm☆63Updated 2 months ago