A simple profiler to count Nvidia PTX assembly instructions of OpenCL/SYCL/CUDA kernels for roofline model analysis.
☆57Mar 20, 2025Updated last year
Alternatives and similar repositories for PTXprofiler
Users that are interested in PTXprofiler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Top-Down Profiler for GPU Applications☆22Feb 29, 2024Updated 2 years ago
- ☆10May 12, 2022Updated 3 years ago
- Runs a single CUDA/OpenCL kernel, taking its source from a file and arguments from the command-line☆26Updated this week
- Rebuild YatSenOS On RISC-V 64.☆23Jan 6, 2022Updated 4 years ago
- study of cutlass☆22Nov 10, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 9 months ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆36Oct 13, 2024Updated last year
- OpenCL is the most powerful programming language ever created. Yet the OpenCL C++ bindings are cumbersome and the code overhead prevents …☆475Apr 7, 2026Updated 3 weeks ago
- A small OpenCL benchmark program to measure peak GPU/CPU performance.☆293Mar 19, 2026Updated last month
- CUDA grammar for tree-sitter☆33Nov 23, 2025Updated 5 months ago
- GPGPU-SIM 使用篇☆14Nov 12, 2022Updated 3 years ago
- simple port of hpl-2.0 to use NVIDIA GPU accelation with CUBLAS☆29May 13, 2013Updated 12 years ago
- Speeding Up Your Python Codes 1000x☆12Apr 2, 2025Updated last year
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆93Apr 14, 2026Updated 2 weeks ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- The translator that supports translating NVPTX to SPIR-V. This translator is modified from LLVM-SPIR-V Translator.☆45Oct 25, 2021Updated 4 years ago
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 7 months ago
- Unit benchmarks of CUDA event APIs.☆17Apr 23, 2024Updated 2 years ago
- A native GPU bytecode compiler for constructive solid geometry☆25May 29, 2019Updated 6 years ago
- A Symbolic Emulator for Shuffle Synthesis on the NVIDIA PTX Code☆16Mar 19, 2023Updated 3 years ago
- A Embedded Gpu Profiler for Dear ImGui App☆67Aug 23, 2025Updated 8 months ago
- Cheap: customized heaps for improved application performance.☆28Oct 11, 2022Updated 3 years ago
- ☆23Dec 18, 2025Updated 4 months ago
- Set sail on anime computer graphics!☆15Mar 1, 2023Updated 3 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- A GPU benchmark suite for autotuners☆19Feb 20, 2024Updated 2 years ago
- ☆24Jun 12, 2023Updated 2 years ago
- Drastically Reducing the Number of Trainable Parameters in Deep CNNs by Inter-layer Kernel-sharing☆14Mar 28, 2023Updated 3 years ago
- ☆41Apr 3, 2022Updated 4 years ago
- A distributed key value database based on LSM Tree storage☆15Aug 24, 2022Updated 3 years ago
- Generate publication-quality figures using python☆23Jun 5, 2016Updated 9 years ago
- ☆58Mar 12, 2026Updated last month
- An unofficial cuda assembler, for all generations of SASS, hopefully :)☆85Mar 20, 2023Updated 3 years ago
- a simple API to use CUPTI☆10Aug 19, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- A pure-Python implementation of the Nvidia CuTe layout algebra intended to be approachable and easy to learn.☆173Updated this week
- Yet another toy CPU.☆92Dec 10, 2023Updated 2 years ago
- eBPF tool to collect BOLT profile☆14Apr 9, 2026Updated 3 weeks ago
- Parallel Bytecode Interpreter For Heterogeneous Hardware☆15Aug 27, 2021Updated 4 years ago
- Horizontal Fusion☆24Jan 7, 2022Updated 4 years ago
- A tool for cross-checking Verilog compilers☆15Apr 16, 2025Updated last year
- Paging Debug tool for GDB using python☆13Jun 4, 2022Updated 3 years ago