A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆57Mar 20, 2025Updated 11 months ago
Alternatives and similar repositories for PTXprofiler
Users that are interested in PTXprofiler are comparing it to the libraries listed below
Sorting:
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- ☆11Jun 9, 2023Updated 2 years ago
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- ☆10May 12, 2022Updated 3 years ago
- outline and links for PLDI 2022 tutorial☆17Jun 13, 2022Updated 3 years ago
- PTX-EMU is a simple emulator for CUDA program.☆37Apr 25, 2025Updated 10 months ago
- study of cutlass☆22Nov 10, 2024Updated last year
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆24Nov 25, 2025Updated 3 months ago
- ☆11May 24, 2020Updated 5 years ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 7 months ago
- A Throughput-Optimized Pipeline Parallel Inference System for Large Language Models☆47Dec 24, 2025Updated 2 months ago
- A tool for cross-checking Verilog compilers☆14Apr 16, 2025Updated 10 months ago
- Indexed Allocator C++ lib☆12Nov 6, 2019Updated 6 years ago
- Speeding Up Your Python Codes 1000x☆12Apr 2, 2025Updated 11 months ago
- Библиотека для работы с API брокеров бинарных опционов☆15Jun 7, 2021Updated 4 years ago
- Generate Linux Perf event tables for Apple Silicon☆17Dec 16, 2025Updated 2 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Oct 13, 2024Updated last year
- Non-nullable pointers with 0-overhead and no hidden runtime cost.☆12Dec 21, 2020Updated 5 years ago
- Header only C++11 glTF 2.0 loader☆14Apr 26, 2019Updated 6 years ago
- RISC-V-based many-core neuromorphic architecture☆15Aug 3, 2025Updated 6 months ago
- C++ Memory allocator for packet queues that free() in roughly the same order that they alloc().☆16Mar 15, 2018Updated 7 years ago
- Wicked fast, thread safe in-memory key/object store for C++☆12Dec 8, 2016Updated 9 years ago
- Customisable, thread-safe C11 memory allocator based off the K&R "storage allocator"☆14Jul 5, 2024Updated last year
- In order to solve the actual vehicle routing problem, an intelligent order allocation algorithm for booking trips online between cities i…☆10May 5, 2020Updated 5 years ago
- Cheap: customized heaps for improved application performance.☆28Oct 11, 2022Updated 3 years ago
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆281Feb 18, 2026Updated last week
- ☆59Feb 5, 2026Updated 3 weeks ago
- Extensions for the Visual Studio C++/CLI marshaling framework☆17Aug 25, 2014Updated 11 years ago
- Physically Based Shading for Call of Pripyat☆10Jun 10, 2023Updated 2 years ago
- CUDA grammar for tree-sitter☆33Nov 23, 2025Updated 3 months ago
- wrappers for tarantool small allocated objects☆12Dec 13, 2019Updated 6 years ago
- A modern C++-RAII utility library, based on the C++20 proposal☆12Mar 28, 2020Updated 5 years ago
- Vector Bazel Rules and Toolchains☆14Feb 18, 2026Updated last week
- Experimental Dx12 engine with Voxel cone tracing for realtime GI☆12Feb 22, 2026Updated last week
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆15Mar 19, 2023Updated 2 years ago
- Data structures for ASTs☆14Dec 6, 2022Updated 3 years ago
- A useful tool to edit and visualize shaders in real time.☆10Aug 5, 2021Updated 4 years ago
- Dynamic Car Dashboard Gauge with Qt QML☆16Aug 12, 2023Updated 2 years ago
- Minimal implementation of malloc and free for demo purposes.☆12Mar 21, 2022Updated 3 years ago