CisMine / Setup-as-Cuda-programmers
Setup Cuda
☆20Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for Setup-as-Cuda-programmers
- NVIDIA tools guide☆71Updated 3 months ago
- Implement Neural Networks in Cuda from Scratch☆22Updated 6 months ago
- ☆133Updated 9 months ago
- Personal notes on CUDA programming☆51Updated last year
- Read custom dataset☆11Updated last year
- Learning about CUDA by writing PTX code.☆29Updated 8 months ago
- ☆55Updated last week
- CUDA Learning guide☆256Updated 5 months ago
- A set of hands-on tutorials for CUDA programming☆194Updated 7 months ago
- Custom kernels in Triton language for accelerating LLMs☆17Updated 7 months ago
- CUDA Matrix Multiplication Optimization☆141Updated 4 months ago
- ☆83Updated 8 months ago
- Cataloging released Triton kernels.☆138Updated 2 months ago
- From zero to hero CUDA for accelerating maths and machine learning on GPU.☆171Updated 3 months ago
- ☆19Updated 3 weeks ago
- Examples from Programming in Parallel with CUDA☆108Updated last year
- ☆52Updated 11 months ago
- Simple and fast low-bit matmul kernels in CUDA / Triton☆145Updated last week
- A plugin for Jupyter Notebook to run CUDA C/C++ code☆201Updated 2 months ago
- Learn CUDA with PyTorch☆14Updated 2 weeks ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆616Updated 3 months ago
- Step-by-step optimization of CUDA SGEMM☆242Updated 2 years ago
- SYCL implementation of Fused MLPs for Intel GPUs☆43Updated 3 weeks ago
- A user-friendly tool chain that enables the seamless execution of ONNX models using JAX as the backend.☆98Updated 2 months ago
- NVIDIA HPCG is based on the HPCG benchmark and optimized for performance on NVIDIA accelerated HPC systems.☆44Updated last month
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆45Updated 3 years ago
- Fast CUDA matrix multiplication from scratch☆482Updated 10 months ago
- Two implementations of ZeRO-1 optimizer sharding in JAX☆13Updated last year
- Examples from the "C++ From Scratch" Series☆65Updated last year
- extensible collectives library in triton☆72Updated 2 months ago