Emulating DMA Engines on GPUs for Performance and Portability
☆41May 17, 2015Updated 10 years ago
Alternatives and similar repositories for CudaDMA
Users that are interested in CudaDMA are comparing it to the libraries listed below
Sorting:
- A Sound and Complete Verification Tool for Warp-Specialized GPU Kernels☆19Jun 17, 2015Updated 10 years ago
- NS3 simulator for RDMA load balancing☆11Jan 31, 2025Updated last year
- Multiple 1-stencil implementations using nvidia cuda.☆13Dec 2, 2017Updated 8 years ago
- ZC RISCV CORE☆12Dec 19, 2019Updated 6 years ago
- General Purpose Graphics Processing Unit (GPGPU) IP Core☆11Jul 4, 2014Updated 11 years ago
- FPGA CryptoNight V7 Minner☆31Aug 26, 2019Updated 6 years ago
- High-speed GEMV kernels, at most 2.7x speedup compared to pytorch baseline.☆128Jul 13, 2024Updated last year
- A tiny FP8 multiplication unit written in Verilog. TinyTapeout 2 submission.☆14Nov 23, 2022Updated 3 years ago
- ☆11Aug 8, 2021Updated 4 years ago
- ☆18Apr 8, 2022Updated 3 years ago
- Parallel Algorithms for Octree Meshing☆12Dec 31, 2015Updated 10 years ago
- Transformers components but in Triton☆34May 9, 2025Updated 9 months ago
- cuASR: CUDA Algebra for Semirings☆44Aug 22, 2022Updated 3 years ago
- A low-level transport Linux kernel module for bulk low-latency data transfers between two SoCs over PCIe NTB☆20May 2, 2023Updated 2 years ago
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Nov 23, 2024Updated last year
- Classical molecular dynamics proxy application.☆31Jun 29, 2020Updated 5 years ago
- ☆178May 7, 2025Updated 9 months ago
- Structured PIC proxy app based on Cabana☆15Jun 30, 2025Updated 8 months ago
- Deep learning accelerator for convolutional layer (convolution operation) and fully-connected layer(matrix-multiplication).☆20Nov 18, 2018Updated 7 years ago
- ☆40Feb 28, 2020Updated 6 years ago
- ☆20Nov 12, 2025Updated 3 months ago
- Fast and efficient attention method exploration and implementation.☆25Mar 25, 2025Updated 11 months ago
- TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.☆19May 12, 2024Updated last year
- Test suite for probing the numerical behavior of NVIDIA tensor cores☆43Jul 24, 2024Updated last year
- a clone of POCL that includes RISC-V newlib devices support and Vortex☆49Jan 14, 2026Updated last month
- Heterogeneous Accelerated Computed Cluster (HACC) Resources Page☆22Oct 7, 2025Updated 4 months ago
- double_fpu_verilog☆20Jul 17, 2014Updated 11 years ago
- Fast and Efficient Deep Learning Library in C☆18Jun 3, 2022Updated 3 years ago
- ☆14Jul 28, 2016Updated 9 years ago
- Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside …☆25Feb 4, 2026Updated 3 weeks ago
- Replace original DRAM model in GPGPU-sim with Ramulator DRAM model☆21Dec 10, 2018Updated 7 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Mar 13, 2023Updated 2 years ago
- MiniAero Unstructured Finite Volume Compressible Navier-Stokes Mini-App☆22May 9, 2024Updated last year
- Soleil-X is a turbulence/particle/radiation solver written in the Regent language for execution with the Legion runtime.☆17Jul 16, 2025Updated 7 months ago
- A set of radiation transport mini-applications used for performance optimization on HPC systems.☆29Aug 14, 2018Updated 7 years ago
- Linux Cross-Memory Attach☆96Feb 18, 2026Updated last week
- Thunder Research Group's Collective Communication Library☆47Jul 8, 2025Updated 7 months ago
- nvptx-tools: a collection of tools for use with nvptx-none GCC toolchains.☆51Sep 5, 2024Updated last year
- ☆22Feb 18, 2025Updated last year