matrix multiplication in CUDA
☆125Aug 10, 2023Updated 2 years ago
Alternatives and similar repositories for matrix-cuda
Users that are interested in matrix-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆12Aug 22, 2023Updated 2 years ago
- Large matrix multiplication in CUDA☆17Oct 20, 2023Updated 2 years ago
- using pvanet framework train mobilenet-v2 for objects detection, papaer: https://arxiv.org/abs/1611.08588☆13Feb 13, 2019Updated 7 years ago
- I implemented a parallel algorithm for matrix inversion based on Gauss-Jordan elimination.☆46Nov 17, 2015Updated 10 years ago
- Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts☆26Aug 29, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This repository contains my implementation of a shape-constrained network which predicts up to 170 FPS☆12Feb 12, 2019Updated 7 years ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Oct 29, 2017Updated 8 years ago
- A high performance implementation of kmeans algorithm with cuda☆18Sep 7, 2014Updated 11 years ago
- Official Implementation of "RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs"☆28Jul 23, 2025Updated 10 months ago
- Convolutional Neural Network of vgg19 model using Cuda to accelerate☆12Jun 11, 2018Updated 7 years ago
- CUDA implementation of Image Completion Using Global Optimization(Nikos Komodakis and Georgios Tziritas)☆21Mar 19, 2020Updated 6 years ago
- ☆121Apr 11, 2024Updated 2 years ago
- Official Implementation of "LinGCN: Structural Linearized Graph Convolutional Network for Homomorphically Encrypted Inference"☆25Nov 12, 2023Updated 2 years ago
- ☆12Nov 8, 2019Updated 6 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆23Jan 11, 2024Updated 2 years ago
- A simple high performance CUDA GEMM implementation.☆434Jan 4, 2024Updated 2 years ago
- HCC Sample Applications☆13Jan 3, 2017Updated 9 years ago
- A C++/CUDA toolkit for Transformer (NMT) Translator (Decoder)☆17Jan 7, 2019Updated 7 years ago
- Distributed k-nearest Neighbors using Locality Sensitive Hashing and SYCL☆10Jun 7, 2021Updated 4 years ago
- Automatic ReLU Reduction☆15Dec 20, 2023Updated 2 years ago
- DL Dataloader Benchmarks☆20Jan 27, 2025Updated last year
- Step-by-step optimization of CUDA SGEMM☆469Mar 30, 2022Updated 4 years ago
- study of cutlass☆22Nov 10, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- 中国科学院大学高级计算机体系结构课程作业:使用OpenROAD-flow完成RTL到GDS全流程☆30May 30, 2020Updated 6 years ago
- Benchmark tests supporting the TiledCUDA library.☆19Nov 19, 2024Updated last year
- Using C++ magic to capture CUDA kernels and tune them with Kernel Tuner☆21Sep 12, 2025Updated 8 months ago
- ☆11Oct 15, 2020Updated 5 years ago
- Official implementation of "MaxK-GNN: Extremely Fast GPU Kernel Design for Accelerating Graph Neural Networks Training"☆43Mar 4, 2024Updated 2 years ago
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 7 years ago
- ☆10May 21, 2026Updated last week
- Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)☆24Feb 12, 2024Updated 2 years ago
- ☆25Apr 4, 2026Updated last month
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- To be a next-generation DL-based phenotype prediction from genome mutations.☆19May 17, 2021Updated 5 years ago
- ☆18Apr 8, 2022Updated 4 years ago
- A 20M RWKV v6 can do nonogram☆13Oct 18, 2024Updated last year
- ☆12Apr 27, 2013Updated 13 years ago
- ☆19May 17, 2016Updated 10 years ago
- Custom-Precision Floating-point numbers.☆44Mar 1, 2026Updated 2 months ago
- Simple and efficient memory pool is implemented with C++11.☆10Jun 2, 2022Updated 3 years ago