lzhengchun / matrix-cudaView external linksLinks
matrix multiplication in CUDA
☆125Aug 10, 2023Updated 2 years ago
Alternatives and similar repositories for matrix-cuda
Users that are interested in matrix-cuda are comparing it to the libraries listed below
Sorting:
- Musings in GEMM (General Matrix Multiplication)☆14Dec 14, 2025Updated 2 months ago
- This is a c++ implementation of an LSTM Neural Network parallelized for a GPU using CUDA☆25Oct 29, 2017Updated 8 years ago
- ☆12Aug 22, 2023Updated 2 years ago
- I implemented a parallel algorithm for matrix inversion based on Gauss-Jordan elimination.☆45Nov 17, 2015Updated 10 years ago
- A 20M RWKV v6 can do nonogram☆14Oct 18, 2024Updated last year
- A Vector Caching Scheme for Streaming FPGA SpMV Accelerators☆10Sep 7, 2015Updated 10 years ago
- Matrix Multiplication on GPU using Shared Memory considering Coalescing and Bank Conflicts☆25Aug 29, 2022Updated 3 years ago
- ☆11Oct 15, 2020Updated 5 years ago
- This repository contains my implementation of a shape-constrained network which predicts up to 170 FPS☆12Feb 12, 2019Updated 7 years ago
- Repository holding the code base to AC-SpGEMM : "Adaptive Sparse Matrix-Matrix Multiplication on the GPU"☆31Jul 7, 2020Updated 5 years ago
- A high performance implementation of kmeans algorithm with cuda☆18Sep 7, 2014Updated 11 years ago
- ☆18Apr 8, 2022Updated 3 years ago
- Benchmark tests supporting the TiledCUDA library.☆18Nov 19, 2024Updated last year
- cuASR: CUDA Algebra for Semirings☆44Aug 22, 2022Updated 3 years ago
- Mamba support for transformer lens☆19Sep 17, 2024Updated last year
- Escoin: Efficient Sparse Convolutional Neural Network Inference on GPUs☆16Feb 28, 2019Updated 6 years ago
- LLM4HWDesign Starting Toolkit☆19Oct 4, 2024Updated last year
- This repo contains the dataset for paper: Creating a Dataset Supporting Translation Between OpenMP Fortran and C++ Code☆15Dec 1, 2023Updated 2 years ago
- SRL4ORL: Improving Opinion Role Labeling Using Multi-Task Learning With Semantic Role Labeling☆14Oct 10, 2018Updated 7 years ago
- To be a next-generation DL-based phenotype prediction from genome mutations.☆19May 17, 2021Updated 4 years ago
- image to column☆30Jul 15, 2014Updated 11 years ago
- ☆120Apr 11, 2024Updated last year
- CUDA official sample codes☆371Oct 6, 2015Updated 10 years ago
- study of cutlass☆22Nov 10, 2024Updated last year
- Official Implementation of "RTop-K: Ultra-Fast Row-Wise Top-K Selection for Neural Network Acceleration on GPUs"☆29Jul 23, 2025Updated 6 months ago
- A simple high performance CUDA GEMM implementation.☆426Jan 4, 2024Updated 2 years ago
- An open source PDK using TIGFET 10nm devices.☆56Dec 19, 2022Updated 3 years ago
- ☆19May 17, 2016Updated 9 years ago
- Trace Replay and Network Simulation Framework☆21Apr 14, 2021Updated 4 years ago
- CSR-based SpGEMM on nVidia and AMD GPUs☆47Apr 9, 2016Updated 9 years ago
- Benchmark suite containing cache filtered traces for use with Ramulator. These include some of the workloads used in our SIGMETRICS 2019 …☆23Oct 9, 2020Updated 5 years ago
- End to End steps for adding custom ops in PyTorch.☆24Aug 20, 2020Updated 5 years ago
- ☆22Feb 18, 2025Updated 11 months ago
- ngAP's artifact for ASPLOS'24☆25Jul 29, 2025Updated 6 months ago
- SocksDirect code repository☆19Jun 26, 2022Updated 3 years ago
- An FPGA integration and acceleration of the popular FAISS framework for approximate similarity search☆25Jul 20, 2019Updated 6 years ago
- resources pour le cours d'introduction à la programmation des GPUs du mastère spécialisé HPC-AI☆23Jan 11, 2024Updated 2 years ago
- Parallelized and vectorized SpMV on Intel Xeon Phi (Knights Landing, AVX512, KNL)☆24Feb 12, 2024Updated 2 years ago
- Step-by-step optimization of CUDA SGEMM☆431Mar 30, 2022Updated 3 years ago