lipracer / cuda-rt-hookView external linksLinks
☆47Jul 16, 2025Updated 6 months ago
Alternatives and similar repositories for cuda-rt-hook
Users that are interested in cuda-rt-hook are comparing it to the libraries listed below
Sorting:
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- auto deploy neovim like chxuan/vimplus☆12Apr 22, 2025Updated 9 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Jan 11, 2025Updated last year
- ☆38Aug 7, 2025Updated 6 months ago
- Study materials collected while studying☆51Apr 16, 2022Updated 3 years ago
- ☆11Apr 3, 2023Updated 2 years ago
- FPGA-based HyperLogLog Accelerator☆12Jul 13, 2020Updated 5 years ago
- ☆47Dec 13, 2024Updated last year
- Artifact evaluation repo for EuroSys'24.☆29Nov 7, 2023Updated 2 years ago
- Paper list of federated learning: About system design☆13Apr 13, 2022Updated 3 years ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Dec 12, 2023Updated 2 years ago
- ☆12May 13, 2025Updated 9 months ago
- ☆13Sep 11, 2020Updated 5 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- This is an official GitHub repository for the paper, "Towards timeout-less transport in commodity datacenter networks.".☆16Oct 12, 2021Updated 4 years ago
- Open Source SSD Controller. NVMe and Lightstor variants☆18May 21, 2014Updated 11 years ago
- Debug print operator for cudagraph debugging☆14Aug 2, 2024Updated last year
- Artifacts for ATC '22 paper "Faster Software Packet Processing on FPGA NICs with eBPF Program Warping"☆17May 20, 2022Updated 3 years ago
- An external memory allocator example for PyTorch.☆16Aug 10, 2025Updated 6 months ago
- SmartNIC☆14Dec 13, 2018Updated 7 years ago
- ☆34Nov 7, 2022Updated 3 years ago
- DeeperGEMM: crazy optimized version☆73May 5, 2025Updated 9 months ago
- NUMA-Aware Reader-Writer Locks☆19Jun 12, 2014Updated 11 years ago
- Johnson-Lindenstrauss transform (JLT), random projections (RP), fast Johnson-Lindenstrauss transform (FJLT), and randomized Hadamard tran…☆21Jul 11, 2023Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Updated this week
- Accelerate LLM preference tuning via prefix sharing with a single line of code☆51Jul 4, 2025Updated 7 months ago
- GPTQ inference TVM kernel☆40Apr 25, 2024Updated last year
- Automatic virtualization of (general) accelerators.☆46Nov 28, 2022Updated 3 years ago
- ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)☆53Dec 17, 2024Updated last year
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Feb 23, 2024Updated last year
- Manages vllm-nccl dependency☆17Jun 3, 2024Updated last year
- ☆39Dec 14, 2025Updated last month
- Simple CuDNN wrapper☆20Nov 29, 2015Updated 10 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 2 years ago
- matmul using AMX instructions☆23May 7, 2024Updated last year
- [NSDI '24] DINT: Fast In-Kernel Distributed Transactions with eBPF☆53Jul 6, 2024Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆458May 30, 2025Updated 8 months ago
- Repo for OSDI 2023 paper: "Ship your Critical Section Not Your Data: Enabling Transparent Delegation with TCLocks"☆21Nov 6, 2024Updated last year
- corundum work on vu13p☆23Nov 10, 2023Updated 2 years ago