Demonstration of various hardware effects on CUDA GPUs.
☆392Nov 22, 2023Updated 2 years ago
Alternatives and similar repositories for hardware-effects-gpu
Users that are interested in hardware-effects-gpu are comparing it to the libraries listed below
Sorting:
- Demonstration of various hardware effects.☆2,970Feb 29, 2024Updated 2 years ago
- collection of benchmarks to measure basic GPU capabilities☆498Oct 24, 2025Updated 4 months ago
- CUDA Kernel Benchmarking Library☆827Feb 28, 2026Updated last week
- [ARCHIVED] Cooperative primitives for CUDA C++. See https://github.com/NVIDIA/cccl☆1,820Oct 9, 2023Updated 2 years ago
- Vulkan debug layer to visualize synchronization dependencies☆44Oct 21, 2019Updated 6 years ago
- C++ implementation of a fast and memory efficient hash map and hash set specialized for strings☆185Nov 2, 2025Updated 4 months ago
- A flexible and efficient deep neural network (DNN) compiler that generates high-performance executable from a DNN model description.☆1,005Sep 19, 2024Updated last year
- Tile primitives for speedy kernels☆3,218Updated this week
- Examples demonstrating available options to program multiple GPUs in a single node or a cluster☆870Sep 26, 2025Updated 5 months ago
- ☆1,995Jul 29, 2023Updated 2 years ago
- Multi-target compiler for Sum-Product Networks, based on MLIR and LLVM.☆25Nov 29, 2024Updated last year
- CUDA Templates and Python DSLs for High-Performance Linear Algebra☆9,389Updated this week
- A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.☆57Mar 20, 2025Updated 11 months ago
- A source-to-source translator for OpenACC to OpenMP.☆16May 18, 2021Updated 4 years ago
- Structured PIC proxy app based on Cabana☆15Jun 30, 2025Updated 8 months ago
- CUDA Core Compute Libraries☆2,196Mar 5, 2026Updated last week
- ☆260Jul 11, 2024Updated last year
- The code for our paper "Neural Architecture Search as Program Transformation Exploration"☆16Apr 28, 2021Updated 4 years ago
- Testing various BCn texture format decoding libraries☆17Sep 19, 2022Updated 3 years ago
- Assembler for NVIDIA Maxwell architecture☆1,059Jan 3, 2023Updated 3 years ago
- [ARCHIVED] The C++ Standard Library for your entire system. See https://github.com/NVIDIA/cccl☆2,307Feb 7, 2024Updated 2 years ago
- Inspect floating point computations☆144Jul 25, 2021Updated 4 years ago
- MSCCL++: A GPU-driven communication stack for scalable AI applications☆476Updated this week
- Decoding Attention is specially optimized for MHA, MQA, GQA and MLA using CUDA core for the decoding stage of LLM inference.☆46Jun 11, 2025Updated 9 months ago
- Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…☆529Sep 8, 2024Updated last year
- Awesome resources for GPUs☆609Jul 1, 2023Updated 2 years ago
- ☆11Mar 22, 2023Updated 2 years ago
- A dynamic GPU memory allocator, suitable for warp synchronized scenarios.☆11Aug 20, 2019Updated 6 years ago
- Faster SSE and new AVX/AVX2 software intrinsics to use with the Windows 11 SDK for easier porting to Windows on ARM☆14Dec 22, 2025Updated 2 months ago
- Fork of icculus's HLSL Bytecode -> GLSL translator with tweaks for MonoGame☆15May 28, 2023Updated 2 years ago
- Mirage Persistent Kernel: Compiling LLMs into a MegaKernel☆2,148Feb 23, 2026Updated 2 weeks ago
- A Low-Level Abstraction of Memory Access☆93Feb 29, 2024Updated 2 years ago
- A single-header C++ library for simplifying the use of CUDA Runtime Compilation (NVRTC).☆569Sep 15, 2025Updated 5 months ago
- DWARF-based stack walks with eBPF☆13Aug 18, 2021Updated 4 years ago
- ROS-Industrial REPs☆14May 23, 2021Updated 4 years ago
- The repository contains container recipes to build the entire stack of Xeus-Cling and Cling including cuda extension with just a few comm…☆10Dec 22, 2020Updated 5 years ago
- Real-time GPU profiling layer for Vulkan applications.☆87Mar 4, 2026Updated last week
- Implementing Different Methods of Circle to Circle Collision Detection using variety of new Technologies: Vulkan Graphics/Compute API, AV…☆62Dec 28, 2020Updated 5 years ago
- Simple, fast, accurate single-header microbenchmarking functionality for C++11/14/17/20☆1,675Oct 6, 2024Updated last year