n-eiling / cuda-fatbin-decompression
☆10Updated last year
Related projects ⓘ
Alternatives and complementary repositories for cuda-fatbin-decompression
- ☆10Updated 4 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆22Updated last month
- Optimize GEMM with tensorcore step by step☆15Updated 11 months ago
- Triton to TVM transpiler.☆16Updated last month
- PTX-EMU is a simple emulator for CUDA program.☆24Updated 10 months ago
- Automatic virtualization of (general) accelerators.☆40Updated last year
- A GPU FP32 computation method with Tensor Cores.☆18Updated 2 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- Skyloft: A General High-Efficient Scheduling Framework in User Space (SOSP 2024)☆22Updated 2 months ago
- CUPTI GPU Profiler☆37Updated 5 years ago
- CUDAAdvisor: a GPU profiling tool☆48Updated 6 years ago
- Torch Frontend for IREE☆25Updated 11 months ago
- ☆50Updated 5 years ago
- HeteroSync is a benchmark suite for performing fine-grained synchronization on tightly coupled GPUs☆27Updated 2 months ago
- Emulating DMA Engines on GPUs for Performance and Portability☆34Updated 9 years ago
- An IR for efficiently simulating distributed ML computation.☆25Updated 10 months ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆11Updated last year
- Data-Centric MLIR dialect☆38Updated last year
- A Top-Down Profiler for GPU Applications☆13Updated 8 months ago
- A Python script to convert the output of NVIDIA Nsight Systems (in SQLite format) to JSON in Google Chrome Trace Event Format.☆22Updated 2 months ago
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆34Updated last year
- GPU Performance Advisor☆63Updated 2 years ago
- Mille Crepe Bench: layer-wise performance analysis for deep learning frameworks.☆17Updated 5 years ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated 6 months ago
- ☆40Updated 3 years ago
- ☆27Updated last year
- CAKE Library for constant-bandwidth matrix multiplication on CPUs☆14Updated 7 months ago
- ☆29Updated 2 years ago
- A new memory mapping interface for efficient direct user-space access to byte-addressable storage, published in MICRO2022.☆14Updated 2 years ago
- ☆47Updated 5 years ago