hwang2006 / CUDA-Accelerated-ComputingLinks
☆11Updated 4 months ago
Alternatives and similar repositories for CUDA-Accelerated-Computing
Users that are interested in CUDA-Accelerated-Computing are comparing it to the libraries listed below
Sorting:
- ATLAHS: An Application-centric Network Simulator Toolchain for AI, HPC, and Distributed Storage☆39Updated 3 weeks ago
- ☆19Updated 5 months ago
- Advanced Matrix Extensions (AMX) Guide☆98Updated 3 years ago
- ☆187Updated last year
- ☆26Updated 9 months ago
- NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing☆93Updated last year
- ☆52Updated 3 months ago
- This is the top-level repository for the Accel-Sim framework.☆477Updated this week
- ☆153Updated last year
- ☆128Updated this week
- A highly-flexible GPU simulator for AMD GPUs.☆186Updated this week
- Artifact for paper "PIM is All You Need: A CXL-Enabled GPU-Free System for LLM Inference", ASPLOS 2025☆90Updated 4 months ago
- LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale☆141Updated 2 months ago
- Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO☆32Updated last year
- ☆52Updated last year
- Allo: A Programming Model for Composable Accelerator Design☆281Updated this week
- LLM Inference analyzer for different hardware platforms☆92Updated 2 months ago
- ☆151Updated 7 months ago
- ASTRA-sim2.0: Modeling Hierarchical Networks and Disaggregated Systems for Large-model Training at Scale☆436Updated 3 weeks ago
- An interference-aware scheduler for fine-grained GPU sharing☆147Updated 8 months ago
- LLM serving cluster simulator☆114Updated last year
- Unofficial description of the CUDA assembly (SASS) instruction sets.☆144Updated 2 months ago
- ☆91Updated last year
- ☆17Updated 6 months ago
- UPMEM LLM Framework allows profiling PyTorch layers and functions and simulate those layers/functions with a given hardware profile.☆34Updated last month
- DeepSeek-V3/R1 inference performance simulator☆167Updated 6 months ago
- 📚 A curated list of awesome matrix-matrix multiplication (A * B = C) frameworks, libraries and software☆54Updated 7 months ago
- ☆23Updated 5 months ago
- A Cycle-level simulator for M2NDP☆30Updated last month
- Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.☆89Updated 2 years ago