VivekPanyam / cudaparsersLinks
Parsers for CUDA binary files
☆22Updated last year
Alternatives and similar repositories for cudaparsers
Users that are interested in cudaparsers are comparing it to the libraries listed below
Sorting:
- Intel® Instrumentation and Tracing Technology (ITT) and Just-In-Time (JIT) APIs☆117Updated 3 weeks ago
- Tenstorrent system interface library☆28Updated this week
- Virtual machine for executing CUDA PTX without a GPU☆38Updated last year
- A fast and accurate reuse distance analyzer for multi-threaded applications. It leverages existing hardware features in commodity CPUs.☆19Updated 2 years ago
- An attempt at safe imperative GPU programming.☆44Updated 2 weeks ago
- VectorVisor is a vectorizing binary translator for GPUs, designed to make it easy to run many copies of a single-threaded WebAssembly pro…☆150Updated 9 months ago
- Rust bindings to the MLIR C API.☆65Updated last month
- ☆77Updated this week
- Tools and experiments for 0sim. Simulate system software behavior on machines with terabytes of main memory from your desktop.☆21Updated 5 years ago
- Re-implementation of the TASO compiler using equality saturation☆130Updated 4 years ago
- Exploring the scalable matrix extension of the Apple M4 processor☆187Updated 8 months ago
- Heterogeneous Containerization of Large Language Model Apps☆45Updated last month
- PTX-EMU is a simple emulator for CUDA program.☆34Updated 2 months ago
- A lightweight memory allocator for hardware-accelerated machine learning☆154Updated 3 months ago
- ☆13Updated 4 years ago
- Rex is a safe and usable kernel extension framework that allows loading and executing Rust kernel extension programs in the place of eBPF…☆68Updated this week
- A verified library of synchronization primitives and concurrent data structures☆36Updated this week
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated 9 months ago
- Source code for the paper "Profile Guided Optimization without Profiles: A Machine Learning Approach"☆25Updated 3 years ago
- Assured confidential execution (ACE) implements VM-based trusted execution environment (TEE) for embedded RISC-V systems with focus on a …☆170Updated this week
- Super fast FP32 matrix multiplication on RDNA3☆68Updated 3 months ago
- MLIR metal dialect☆28Updated 10 months ago
- A zero-copy serialization library and networking stack.☆47Updated last year
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆107Updated this week
- A description of Minotaur can be found in https://arxiv.org/abs/2306.00229.☆110Updated 11 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆28Updated 6 months ago
- MPIWasm is a WebAssembly Embedder based on Wasmer that enables the high-performance execution of MPI applications compiled to Wasm. (ACM …☆19Updated last year
- Experimental ONNX implementation for WASI NN.☆48Updated 3 years ago
- Embedded Universal DSL: a good DSL for us, by us☆40Updated this week