hwang2006 / CUDA-Accelerated-ComputingLinks

☆11

Alternatives and similar repositories for CUDA-Accelerated-Computing

Users that are interested in CUDA-Accelerated-Computing are comparing it to the libraries listed below

Sorting:

Yufeng98 / CENT
Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025
☆71Updated last month
PSAL-POSTECH / M2NDP-public
A Cycle-level simulator for M2NDP
☆28Updated last month
casys-kaist / NeuPIMs
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
☆83Updated last year
platformxlab / G10
☆36Updated last year
Sys-KU / AutoTiering
[USENIX ATC '21] Exploring the Design Space of Page Management for Multi-Tiered Memory Systems
☆47Updated 3 years ago
miglopst / PIM_NDP_papers
☆65Updated 4 years ago
accel-sim / gpu-app-collection
A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.
☆66Updated last month
CMU-SAFARI / prim-benchmarks
PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is dev…
☆157Updated last year
AIS-SNU / PID-Comm
☆24Updated 6 months ago
hpdps-group / coccl
COCCL: Compression and precision co-aware collective communication library
☆22Updated 3 months ago
astra-sim / libra
LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models
☆11Updated last year
argonne-lcf / dlio_benchmark
An I/O benchmark for deep Learning applications
☆87Updated last week
PrincetonUniversity / LLMCompass
☆154Updated 11 months ago
ruipeterpan / marconi
Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Honorable Mention]
☆13Updated 3 months ago
sitar-lab / NeuSight
☆42Updated this week
accel-sim / gpgpu-sim_distribution
GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated…
☆54Updated last week
barabanshek / sabre
☆18Updated last year
mosharaf / cse585
Advanced Scalable Systems for X
☆34Updated 6 months ago
ece-fast-lab / ASPLOS-2025-M5
This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …
☆13Updated 2 months ago
casys-kaist / HUVM
☆23Updated 2 years ago
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆37Updated last year
astra-sim / tacos
TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning
☆23Updated 2 weeks ago
VIA-Research / uPIMulator
☆143Updated 4 months ago
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆120Updated last week
ranggihwang / Pregated_MoE
☆48Updated last year
spypaul / MQSim_CXL_Linux
☆26Updated 2 years ago
WangYaohuii / CXL-SSD-Sim
A Full-System Simulator for CXL-Based SSD Memory System
☆28Updated 6 months ago
merthidayetoglu / HiCCL
A hierarchical collective communications library with portable optimizations
☆35Updated 6 months ago
Sys-KU / DeepPlan
[ACM EuroSys '23] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆56Updated last year
Systems-ShiftLab / MultiPIM
MultiPIM: A Detailed and Configurable Multi-Stack Processing-In-Memory Simulator
☆55Updated 4 years ago