matrix multiplication in CUDA
☆125Aug 10, 2023Updated 2 years ago
Alternatives and similar repositories for matrix-cuda
Users that are interested in matrix-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Aug 22, 2023Updated 2 years ago
- Large matrix multiplication in CUDA☆17Oct 20, 2023Updated 2 years ago
- using pvanet framework train mobilenet-v2 for objects detection, papaer: https://arxiv.org/abs/1611.08588☆13Feb 13, 2019Updated 7 years ago
- This repo contains the dataset for paper: Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code☆15Dec 1, 2023Updated 2 years ago
- Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts☆26Aug 29, 2022Updated 3 years ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- This repository contains my implementation of a shape-constrained network which predicts up to 170 FPS☆12Feb 12, 2019Updated 7 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Oct 29, 2017Updated 8 years ago
- Convolutional Neural Network of vgg19 model using Cuda to accelerate☆12Jun 11, 2018Updated 7 years ago
- A Vector Caching Scheme for Streaming FPGA SpMV Accelerators☆10Sep 7, 2015Updated 10 years ago
- ☆120Apr 11, 2024Updated 2 years ago
- Official Implementation of "LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference"☆25Nov 12, 2023Updated 2 years ago
- ☆13Nov 8, 2019Updated 6 years ago
- BlueDBM hw/sw implementation using the bluespecpcie PCIe library☆12Dec 25, 2022Updated 3 years ago
- ☆23Dec 16, 2025Updated 4 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- A simple high performance CUDA GEMM implementation.☆430Jan 4, 2024Updated 2 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- A C++/CUDA toolkit for Transformer (NMT) Translator (Decoder)☆17Jan 7, 2019Updated 7 years ago
- SemEval2026 Task 3 DimABSA☆31Mar 13, 2026Updated last month
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- A C++ library for computing large scale tensor contractions.☆38Jun 26, 2018Updated 7 years ago
- Step-by-step optimization of CUDA SGEMM☆455Mar 30, 2022Updated 4 years ago
- DL Dataloader Benchmarks☆20Jan 27, 2025Updated last year
- study of cutlass☆22Nov 10, 2024Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 中国科学院大学高级计算机体系结构课程作业:使用OpenROAD-flow完成RTL到GDS全流程☆30May 30, 2020Updated 5 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- An HBM FPGA based SpMV Accelerator☆18Aug 29, 2024Updated last year
- ☆11Oct 15, 2020Updated 5 years ago
- Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"☆43Mar 4, 2024Updated 2 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- Implementation for the protocols described in https://eprint.iacr.org/2023/1700☆14Jan 9, 2025Updated last year
- ☆25Apr 4, 2026Updated 2 weeks ago
- Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)☆24Feb 12, 2024Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An open source PDK using TIGFET 10nm devices.☆57Dec 19, 2022Updated 3 years ago
- ☆18Apr 8, 2022Updated 4 years ago
- ☆11Apr 27, 2013Updated 12 years ago
- ☆19May 17, 2016Updated 9 years ago
- a Log-Structured Merged-Tree store engine☆16Sep 21, 2023Updated 2 years ago
- An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search☆25Jul 20, 2019Updated 6 years ago
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 3 years ago