Multi-V-VM / hetGPULinks
PTX on XPUs
☆72Updated 2 weeks ago
Alternatives and similar repositories for hetGPU
Users that are interested in hetGPU are comparing it to the libraries listed below
Sorting:
- PTX-EMU is a simple emulator for CUDA program.☆37Updated 6 months ago
- A scheduling framework for multitasking over diverse XPUs, including GPUs, NPUs, ASICs, and FPGAs☆129Updated 2 weeks ago
- ☆90Updated 7 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19Updated last year
- Fast OS-level support for GPU checkpoint and restore☆252Updated last month
- Virtuoso is a fast, accurate and versatile simulation framework designed for virtual memory research. Virtuoso uses a new simulation met…☆75Updated 3 weeks ago
- CXL remote offloading data movement aware compiler☆16Updated 3 weeks ago
- Automatic virtualization of (general) accelerators.☆45Updated 2 years ago
- REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU sche…☆102Updated 2 years ago
- Triton to TVM transpiler.☆22Updated last year
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆26Updated last year
- Asynchronous semantics for architectural simulation and synthesis.☆55Updated this week
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆155Updated 3 months ago
- WaferLLM: Large Language Model Inference at Wafer Scale☆65Updated this week
- This serves as a repository for reproducibility of the SC21 paper "In-Depth Analyses of Unified Virtual Memory System for GPU Accelerated…☆36Updated 2 years ago
- ☆201Updated 3 months ago
- Tutorials for NVIDIA CUPTI samples☆37Updated 2 months ago
- Handwritten GEMM using Intel AMX (Advanced Matrix Extension)☆17Updated 9 months ago
- Open ABI and FFI for Machine Learning Systems☆152Updated this week
- DeepSeek-V3/R1 inference performance simulator☆170Updated 7 months ago
- ☆22Updated 3 months ago
- LLVM OpenCL C compiler suite for ventus GPGPU☆57Updated last week
- A compiler to automatically transform applications into disaggregated memory apps.☆16Updated last year
- matmul using AMX instructions☆20Updated last year
- ☆50Updated last year
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆31Updated 8 months ago
- A GPU FP32 computation method with Tensor Cores.☆22Updated 2 years ago
- Ensō is a high-performance streaming interface for NIC-application communication.☆76Updated 2 months ago
- ☆61Updated 5 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆29Updated 10 months ago