hwang2006 / CUDA-Accelerated-ComputingLinks
☆11Updated last month
Alternatives and similar repositories for CUDA-Accelerated-Computing
Users that are interested in CUDA-Accelerated-Computing are comparing it to the libraries listed below
Sorting:
- Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆71Updated last month
- A Cycle-level simulator for M2NDP☆28Updated last month
- NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing☆83Updated last year
- ☆36Updated last year
- [USENIX ATC '21] Exploring the Design Space of Page Management for Multi-Tiered Memory Systems☆47Updated 3 years ago
- ☆65Updated 4 years ago
- A repository where GPU applications are aggregated using a common build flow that supports multiple CUDA versions.☆66Updated last month
- PrIM (Processing-In-Memory benchmarks) is the first benchmark suite for a real-world processing-in-memory (PIM) architecture. PrIM is dev…☆157Updated last year
- ☆24Updated 6 months ago
- COCCL: Compression and precision co-aware collective communication library☆22Updated 3 months ago
- LIBRA: Enabling Workload-aware Multi-dimensional Network Topology Optimization for Distributed Training of Large AI Models☆11Updated last year
- An I/O benchmark for deep Learning applications☆87Updated last week
- ☆154Updated 11 months ago
- Artifact for "Marconi: Prefix Caching for the Era of Hybrid LLMs" [MLSys '25 Outstanding Paper Honorable Mention]☆13Updated 3 months ago
- ☆42Updated this week
- GPGPU-Sim provides a detailed simulation model of a contemporary GPU running CUDA and/or OpenCL workloads and now includes an integrated…☆54Updated last week
- ☆18Updated last year
- Advanced Scalable Systems for X☆34Updated 6 months ago
- This is the respository that holds the artifacts of ASPLOS'25 -- M5: Mastering Page Migration and Memory Management for CXL-based Tiered …☆13Updated 2 months ago
- ☆23Updated 2 years ago
- Thunder Research Group's Collective Communication Library☆37Updated last year
- TACOS: [T]opology-[A]ware [Co]llective Algorithm [S]ynthesizer for Distributed Machine Learning☆23Updated 2 weeks ago
- ☆143Updated 4 months ago
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆120Updated last week
- ☆48Updated last year
- ☆26Updated 2 years ago
- A Full-System Simulator for CXL-Based SSD Memory System☆28Updated 6 months ago
- A hierarchical collective communications library with portable optimizations☆35Updated 6 months ago
- [ACM EuroSys '23] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access☆56Updated last year
- MultiPIM: A Detailed and Configurable Multi-Stack Processing-In-Memory Simulator☆55Updated 4 years ago