VivekPanyam / cudaparsersView external linksLinks
Parsers for CUDA binary files
☆25Dec 29, 2023Updated 2 years ago
Alternatives and similar repositories for cudaparsers
Users that are interested in cudaparsers are comparing it to the libraries listed below
Sorting:
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- A survey of manufacturer-provided DRAM operating parameters and timings as specified by DRAM chip datasheets from between 1970 and 2021. …☆11May 4, 2022Updated 3 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- Kernel Library Wheel for SGLang☆17Updated this week
- ☆25Feb 20, 2024Updated last year
- FlexAttention w/ FlashAttention3 Support☆27Oct 5, 2024Updated last year
- General Purpose Graphics Processing Unit (GPGPU) IP Core☆11Jul 4, 2014Updated 11 years ago
- [WIP] Better (FP8) attention for Hopper☆32Feb 24, 2025Updated 11 months ago
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- ☆15Dec 16, 2021Updated 4 years ago
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- (WIP) Level up your shader game with the GPU + Rust advantage!☆14Mar 4, 2024Updated last year
- Open Source SSD Controller. NVMe and Lightstor variants☆18May 21, 2014Updated 11 years ago
- XML representation of the x86 instruction set☆29Jan 17, 2026Updated 3 weeks ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- Dynamic suballocators for external memory (e.g., Vulkan device memory). Umaintained - consider migrating to https://crates.io/crates/offs…☆15Jul 22, 2022Updated 3 years ago
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Apr 6, 2024Updated last year
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- Pseudo-LRU implementation using 1-bit per entry and achieving Full-LRU performance.☆22Dec 17, 2022Updated 3 years ago
- Fundamental Sources for Water Wave Animation☆20Dec 8, 2022Updated 3 years ago
- ☆20Sep 28, 2024Updated last year
- corundum work on vu13p☆23Nov 10, 2023Updated 2 years ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆32Dec 21, 2024Updated last year
- ☆126Jan 22, 2026Updated 3 weeks ago
- TransferBench is a utility capable of benchmarking simultaneous copies between user-specified devices (CPUs/GPUs)☆57Updated this week
- Framework to reduce autotune overhead to zero for well known deployments.☆96Sep 19, 2025Updated 4 months ago
- A practical way of learning Swizzle☆36Feb 3, 2025Updated last year
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆106Jun 28, 2025Updated 7 months ago
- Trace Replay and Network Simulation Framework☆21Apr 14, 2021Updated 4 years ago
- 如何做技术演讲(how to give a talk)的slide☆22Feb 8, 2021Updated 5 years ago
- ☆26Feb 17, 2025Updated last year
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆94Feb 23, 2023Updated 2 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆27Oct 13, 2024Updated last year
- zMonkey is an open-source 200G network impairment emulator tool☆22Mar 8, 2022Updated 3 years ago
- APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…☆30Jan 28, 2026Updated 2 weeks ago
- ☆42Nov 1, 2025Updated 3 months ago