VivekPanyam / cudaparsersLinks
Parsers for CUDA binary files
☆25Updated last year
Alternatives and similar repositories for cudaparsers
Users that are interested in cudaparsers are comparing it to the libraries listed below
Sorting:
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs☆126Updated 2 weeks ago
- A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.☆21Updated 2 years ago
- VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly pro…☆155Updated last year
- Embedded Universal DSL: a good DSL for us, by us☆58Updated this week
- ☆85Updated this week
- MLIR metal dialect☆34Updated last year
- A description of Minotaur can be found in https://arxiv.org/abs/2306.00229.☆119Updated 3 months ago
- Tenstorrent system interface library☆33Updated this week
- Rust bindings to the MLIR C API.☆69Updated last month
- Exploring the scalable matrix extension of the Apple M4 processor☆213Updated last year
- Virtual machine for executing CUDA PTX without a GPU☆41Updated 2 years ago
- Re-implementation of the TASO compiler using equality saturation☆136Updated 4 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- SquirrelFS: A crash-consistent Rust file system for persistent memory (OSDI 24)☆63Updated 7 months ago
- 💀 The former home of clangir, now part of the official LLVM incubator. See website below for details.☆155Updated 3 years ago
- Asynchronous Rust bindings for UCX☆78Updated 7 months ago
- Rex is a safe and usable kernel extension framework that allows loading and executing Rust kernel extension programs in the place of eBPF…☆123Updated 2 weeks ago
- A lightweight memory allocator for hardware-accelerated machine learning☆176Updated 2 months ago
- ☆29Updated 2 years ago
- A zero-copy serialization library and networking stack.☆49Updated last year
- Simplify the use of performance counters.☆64Updated 3 years ago
- Benchmarks for auto-vectorization and revectorization, including both hand-vectorized and scalar code☆29Updated 6 years ago
- An educational implementation of a modern compressor in Rust☆48Updated 2 years ago
- PTX-EMU is a simple emulator for CUDA program.☆38Updated 7 months ago
- Super fast FP32 matrix multiplication on RDNA3☆81Updated 8 months ago
- A parser for PTX 6.5☆12Updated 2 years ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Updated 11 months ago
- 自嗨虚拟化软件 - 'Enjoy yourself' type-1 hypervisor software☆25Updated 3 years ago
- Resource Allocation for Dynamic Demands☆21Updated last year
- Fast WebAssembly Baseline Compiler☆61Updated 2 years ago