andreinechaev / nvcc4jupyter
A plugin for Jupyter Notebook to run CUDA C/C++ code
☆212Updated 5 months ago
Alternatives and similar repositories for nvcc4jupyter:
Users that are interested in nvcc4jupyter are comparing it to the libraries listed below
- CUDA Matrix Multiplication Optimization☆161Updated 7 months ago
- CUDA Kernel Benchmarking Library☆560Updated 2 months ago
- Step-by-step optimization of CUDA SGEMM☆284Updated 2 years ago
- NVIDIA tools guide☆101Updated last month
- Fast CUDA matrix multiplication from scratch☆632Updated last year
- ☆179Updated this week
- Fastest kernels written from scratch☆170Updated this week
- KernelBench: Can LLMs Write GPU Kernels? - Benchmark with Torch -> CUDA problems☆172Updated last week
- Samples demonstrating how to use the Compute Sanitizer Tools and Public API☆75Updated last year
- CUDA Learning guide☆323Updated 7 months ago
- Training materials associated with NVIDIA's CUDA Training Series (www.olcf.ornl.gov/cuda-training-series/)☆699Updated 6 months ago
- ☆181Updated 7 months ago
- Training material for Nsight developer tools☆147Updated 6 months ago
- A Fusion Code Generator for NVIDIA GPUs (commonly known as "nvFuser")☆303Updated this week
- ☆123Updated 6 months ago
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆606Updated 3 months ago
- Instructions, Docker images, and examples for Nsight Compute and Nsight Systems☆130Updated 4 years ago
- Experimental projects related to TensorRT☆89Updated this week
- Simple neural network implementation using CUDA technology. It is an educational implementation.☆96Updated 6 years ago
- NVIDIA Math Libraries for the Python Ecosystem☆235Updated 2 months ago
- A Easy-to-understand TensorOp Matmul Tutorial☆316Updated 4 months ago
- Cataloging released Triton kernels.☆167Updated last month
- Matrix Multiply-Accumulate with CUDA and WMMA( Tensor Core)☆124Updated 4 years ago
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆64Updated 4 years ago
- ☆159Updated 8 months ago
- Examples from Programming in Parallel with CUDA☆122Updated last year
- Applied AI experiments and examples for PyTorch☆224Updated this week
- LLM training in simple, raw C/CUDA☆91Updated 9 months ago
- High-Performance SGEMM on CUDA devices☆74Updated 3 weeks ago
- Composable Kernel: Performance Portable Programming Model for Machine Learning Tensor Operators☆349Updated this week