coderonion / awesome-cuda-and-hpc
🔥🔥🔥 A collection of some awesome public CUDA, cuBLAS, TensorRT and High Performance Computing (HPC) projects.
☆156Updated last month
Related projects ⓘ
Alternatives and complementary repositories for awesome-cuda-and-hpc
- A scalable High-Level Synthesis framework on MLIR☆228Updated 5 months ago
- CSV spreadsheets and other material for AI accelerator survey papers☆153Updated 9 months ago
- A FPGA Based CNN accelerator, following Google's TPU V1.☆119Updated 5 years ago
- FREE TPU V3plus for FPGA is the free version of a commercial AI processor (EEP-TPU) for Deep Learning EDGE Inference☆108Updated last year
- An FPGA Accelerator for Transformer Inference☆73Updated 2 years ago
- ☆142Updated 5 months ago
- hardware design of universal NPU(CNN accelerator) for various convolution neural network☆71Updated last week
- ☆141Updated last week
- A matrix extension proposal for AI applications under RISC-V architecture☆106Updated 2 weeks ago
- Repository to host and maintain scale-sim-v2 code☆233Updated this week
- ☆59Updated 2 months ago
- GPGPU supporting RISCV-V, developed with verilog HDL☆68Updated 2 months ago
- Research and Materials on Hardware implementation of Transformer Model☆206Updated last week
- NVDLA (An Opensource DL Accelerator Framework) implementation on FPGA.☆306Updated 10 months ago
- ☆35Updated last year
- CNN accelerator implemented with Spinal HDL☆134Updated 9 months ago
- CHARM: Composing Heterogeneous Accelerators on Versal ACAP Architecture☆123Updated this week
- IC implementation of Systolic Array for TPU☆148Updated 3 weeks ago
- AMD University Program HLS tutorial☆61Updated 2 weeks ago
- An open-source parameterizable NPU generator with full-stack multi-target compilation stack for intelligent workloads.☆27Updated 7 months ago
- A DNN Accelerator implemented with RTL.☆61Updated last year
- PyTorch model to RTL flow for low latency inference☆121Updated 7 months ago
- AutoSA: Polyhedral-Based Systolic Array Compiler☆199Updated last year
- ☆122Updated 7 months ago
- Multi-core HW accelerator mapping optimization framework for layer-fused ML workloads.☆38Updated this week
- This repo contains the Assignments from Cornell Tech's ECE 5545 - Machine Learning Hardware and Systems offered in Spring 2023☆20Updated last year
- IC implementation of TPU☆86Updated 4 years ago
- Ventus GPGPU ISA Simulator Based on Spike☆37Updated 2 weeks ago
- ☆45Updated 8 months ago
- FPGA based Vision Transformer accelerator (Harvard CS205)☆84Updated 11 months ago