eunomia-bpf / basic-cuda-tutorialLinks
A collection of CUDA programming examples to learn GPU programming
☆52Updated 2 months ago
Alternatives and similar repositories for basic-cuda-tutorial
Users that are interested in basic-cuda-tutorial are comparing it to the libraries listed below
Sorting:
- ☆222Updated 2 weeks ago
- Hooked CUDA-related dynamic libraries by using automated code generation tools.☆172Updated 2 years ago
- ☆20Updated 6 months ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆150Updated last week
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆144Updated 9 months ago
- qCUDA: GPGPU Virtualization at a New API Remoting Method with Para-virtualization☆132Updated 3 years ago
- matmul using AMX instructions☆22Updated last year
- PTX on XPUs☆115Updated last week
- A compiler to automatically transform applications into disaggregated memory apps.☆16Updated 2 years ago
- Kernel Extensions Large Language Model Agent☆33Updated last year
- Fast OS-level support for GPU checkpoint and restore☆267Updated 3 months ago
- ☆51Updated last year
- [NSDI '24] DINT: Fast In-Kernel Distributed Transactions with eBPF☆53Updated last year
- An MLIR-based AI compiler designed for Python frontend to RISC-V DSA☆13Updated last year
- Source code for the FAST '23 paper “MadFS: Per-File Virtualization for Userspace Persistent Memory Filesystems”☆45Updated 2 years ago
- Source code of KVM☆18Updated 3 years ago
- The official implementation of OSDI'25 paper BlitzScale☆37Updated 3 months ago
- An OS kernel module for fast **remote** fork using advanced datacenter networking (RDMA).☆70Updated 10 months ago
- ☆58Updated last year
- ☆53Updated last month
- DeepSeek-V3/R1 inference performance simulator☆175Updated 9 months ago
- Source code for the virtualization book☆93Updated last week
- PTX-EMU is a simple emulator for CUDA program.☆38Updated 8 months ago
- GeminiFS: A Companion File System for GPUs☆70Updated 10 months ago
- Userspace eBPF Runtime Benchmarking Test Suite and Results☆17Updated last year
- A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.☆123Updated 2 weeks ago
- example code for using DC QP for providing RDMA READ and WRITE operations to remote GPU memory☆152Updated last year
- 基于Rust和eBPF实现RDMA内核模块、驱动☆20Updated 3 years ago
- Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.☆95Updated 2 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆66Updated last year