hwang2006 / CUDA-Accelerated-ComputingLinks
☆11Updated 6 months ago
Alternatives and similar repositories for CUDA-Accelerated-Computing
Users that are interested in CUDA-Accelerated-Computing are comparing it to the libraries listed below
Sorting:
- ☆53Updated 4 months ago
 - ☆199Updated last week
 - Advanced Matrix Extensions (AMX) Guide☆105Updated 3 years ago
 - LLM Inference analyzer for different hardware platforms☆94Updated 3 months ago
 - NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing☆98Updated last year
 - NEO is a LLM inference engine built to save the GPU memory crisis by CPU offloading☆67Updated 4 months ago
 - A highly-flexible GPU simulator for AMD GPUs.☆193Updated last week
 - LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆146Updated 3 months ago
 - ☆27Updated 11 months ago
 - WaferLLM: Large Language Model Inference at Wafer Scale☆63Updated last week
 - PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is dev…☆162Updated last year
 - Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆100Updated 6 months ago
 - UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.☆36Updated 2 months ago
 - LLM serving cluster simulator☆116Updated last year
 - ☆154Updated last year
 - ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆48Updated last month
 - ☆78Updated 3 years ago
 - ONNXim is a fast cycle-level simulator that can model multi-core NPUs for DNN inference☆157Updated 8 months ago
 - AMD RAD's multi-GPU Triton-based framework for seamless multi-GPU programming☆93Updated last week
 - This is the top-level repository for the Accel-Sim framework.☆499Updated this week
 - A Cycle-level simulator for M2NDP☆32Updated 2 months ago
 - ☆22Updated 6 months ago
 - DeepSeek-V3/R1 inference performance simulator☆170Updated 7 months ago
 - ☆55Updated last year
 - Tenstorrent MLIR compiler☆206Updated this week
 - ☆100Updated last year
 - ☆131Updated last week
 - Allo: A Programming Model for Composable Accelerator Design☆292Updated this week
 - ☆156Updated 9 months ago
 - ☆13Updated 6 months ago