matrix multiplication in CUDA
☆126Aug 10, 2023Updated 2 years ago
Alternatives and similar repositories for matrix-cuda
Users that are interested in matrix-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Large matrix multiplication in CUDA☆17Oct 20, 2023Updated 2 years ago
- Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts☆26Aug 29, 2022Updated 3 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Oct 29, 2017Updated 8 years ago
- CUDA official sample codes☆372Oct 6, 2015Updated 10 years ago
- A high performance implementation of kmeans algorithm with cuda☆18Sep 7, 2014Updated 11 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Official Implementation of "RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs"☆28Jul 23, 2025Updated 10 months ago
- Convolutional Neural Network of vgg19 model using Cuda to accelerate☆12Jun 11, 2018Updated 8 years ago
- A Vector Caching Scheme for Streaming FPGA SpMV Accelerators☆10Sep 7, 2015Updated 10 years ago
- Musings in GEMM (General Matrix Multiplication)☆14Dec 14, 2025Updated 6 months ago
- CUDA implementation of Image Completion Using Global Optimization(Nikos Komodakis and Georgios Tziritas)☆21Mar 19, 2020Updated 6 years ago
- Vim plugin for Bluespec SystemVerilog (BSV)☆12Nov 8, 2020Updated 5 years ago
- ☆121Apr 11, 2024Updated 2 years ago
- Official Implementation of "LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference"☆25Nov 12, 2023Updated 2 years ago
- 使用c++以及cuda加速神经网络样例(实现矩阵加法和矩阵乘法)☆56Sep 19, 2021Updated 4 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆12Nov 8, 2019Updated 6 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- A C++/CUDA toolkit for Transformer (NMT) Translator (Decoder)☆17Jan 7, 2019Updated 7 years ago
- Distributed k-nearest Neighbors using Locality Sensitive Hashing and SYCL☆10Jun 12, 2026Updated last week
- Source code examples from the Parallel Forall Blog☆1,330Sep 23, 2025Updated 8 months ago
- Automatic ReLU Reduction☆15Dec 20, 2023Updated 2 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- A C++ library for computing large scale tensor contractions.☆38Jun 26, 2018Updated 7 years ago
- A Bluespec SystemVerilog library of miscellaneous components☆18May 19, 2026Updated last month
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Step-by-step optimization of CUDA SGEMM☆477Mar 30, 2022Updated 4 years ago
- study of cutlass☆22Nov 10, 2024Updated last year
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- An HBM FPGA based SpMV Accelerator☆18Aug 29, 2024Updated last year
- clEsperanto - GPU-accelerated image processing across languages and platforms☆11Feb 13, 2021Updated 5 years ago
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 9 months ago
- ☆11Oct 15, 2020Updated 5 years ago
- Classify traffic signs using deep neural networks☆13Mar 5, 2017Updated 9 years ago
- Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"☆43Mar 4, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- Network- and GPU-aware management of serverless functions at the edge☆15Mar 3, 2023Updated 3 years ago
- ☆10May 21, 2026Updated 3 weeks ago
- Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)☆24Feb 12, 2024Updated 2 years ago
- ☆13Dec 17, 2021Updated 4 years ago
- ☆18Apr 8, 2022Updated 4 years ago
- A program for downloading sci literature☆10May 10, 2018Updated 8 years ago