coderonion / awesome-cuda-and-hpc
π₯π₯π₯ A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
β157Updated last month
Related projects β
Alternatives and complementary repositories for awesome-cuda-and-hpc
- CSV spreadsheets and other material for AI accelerator survey papersβ154Updated 9 months ago
- An FPGA Accelerator for Transformer Inferenceβ73Updated 2 years ago
- FREE TPU V3plus for FPGA is the free version of a commercial AI processor (EEP-TPU) for Deep Learning EDGE Inferenceβ109Updated last year
- β143Updated 5 months ago
- β37Updated last year
- CHARM: Composing Heterogeneous Accelerators on Versal ACAP Architectureβ124Updated 2 weeks ago
- β143Updated 2 weeks ago
- A scalable High-Level Synthesis framework on MLIRβ228Updated 6 months ago
- hardware design of universal NPU(CNN accelerator) for various convolution neural networkβ75Updated this week
- A matrix extension proposal for AI applications under RISC-V architectureβ109Updated 3 weeks ago
- Research and Materials on Hardware implementation of Transformer Modelβ211Updated 2 weeks ago
- [HPCA'21] SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruningβ76Updated 2 months ago
- AutoSA: Polyhedral-Based Systolic Array Compilerβ200Updated last year
- β42Updated 5 years ago
- Multi-core HW accelerator mapping optimization framework for layer-fused ML workloads.β40Updated last week
- IC implementation of Systolic Array for TPUβ152Updated last month
- Edge-MoE: Memory-Efficient Multi-Task Vision Transformer Architecture with Task-level Sparsity via Mixture-of-Expertsβ87Updated 6 months ago
- An open-source parameterizable NPU generator with full-stack multi-target compilation stack for intelligent workloads.β31Updated 7 months ago
- Repository to host and maintain scale-sim-v2 codeβ233Updated 2 weeks ago
- GPGPU supporting RISCV-V, developed with verilog HDLβ68Updated 3 months ago
- PyTorch model to RTL flow for low latency inferenceβ121Updated 8 months ago
- You can run it on pynq z1. The repository contains the relevant Verilog code, Vivado configuration and C code for sdk testing. The size oβ¦β118Updated 7 months ago
- This is a series of quick start guide of Vitis HLS tool in Chinese. It explains the basic concepts and the most important optimize techniβ¦β18Updated 2 years ago
- β60Updated this week
- AMD University Program HLS tutorialβ63Updated 3 weeks ago
- CNN accelerator implemented with Spinal HDLβ136Updated 9 months ago
- [TCAD'23] AccelTran: A Sparsity-Aware Accelerator for Transformersβ33Updated 11 months ago
- β41Updated 3 years ago
- Ventus GPGPU ISA Simulator Based on Spikeβ37Updated 3 weeks ago
- FlexGripPlus: an open-source GPU model for reliability evaluation and micro architectural simulationβ85Updated last year
- [HPCA 2023] ViTCoD: Vision Transformer Acceleration via Dedicated Algorithm and Accelerator Co-Designβ97Updated last year