Different implementation of sparse matrix multiplication. All matrices are in CSR format. The code contains different CUDA kernels for multiply sparse matrix vs dense vector and sparse matrix vs another sparse matrix. It contains several cuda kernel for sparse matrix dense vector product and sparse matrix sparse matrix product.
☆17Nov 15, 2010Updated 15 years ago
Alternatives and similar repositories for CudaDotProd
Users that are interested in CudaDotProd are comparing it to the libraries listed below
Sorting:
- Gale&Church (1993) sentence alignment☆16May 9, 2020Updated 5 years ago
- record power consumption on thinkpads and create a gnuplot graph☆10May 8, 2019Updated 6 years ago
- http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en//pubs/archive/36266.pdf☆14Apr 25, 2012Updated 13 years ago
- ☆13Feb 18, 2026Updated 2 weeks ago
- An implementation of the Pregel graph processing system on the Spark cluster computing framework. Merged into Spark; please see:☆11Apr 9, 2011Updated 14 years ago
- ☆10Aug 4, 2022Updated 3 years ago
- Code for "On Long-Tailed Phenomena in NMT".☆10Jan 10, 2021Updated 5 years ago
- Tomasulo Simulator written in React as the project for Computer Architecture course, Spring 2019, Tsinghua University☆11Jun 9, 2019Updated 6 years ago
- A Book Recommendation System Based on Knowledge Graphs and User Comments 基于知识图谱和用户评论的图书推荐系统☆18Feb 13, 2026Updated 3 weeks ago
- Code for the paper Faster Phrase-Based Decoding by Refining Feature State☆14Jan 9, 2023Updated 3 years ago
- A Multi-GPU version for CoreNeuron☆11Oct 13, 2017Updated 8 years ago
- Bilingual sentence aligner (Gale & Church, 1993)☆14Jan 8, 2026Updated 2 months ago
- Grounding statistical machine translation with semantic parsing☆14May 13, 2015Updated 10 years ago
- High-performance CUDA kernels for real-time financial low latency inference, optimized for both consumer and datacenter GPUs.☆20Jul 25, 2025Updated 7 months ago
- Zero-Overhead bare-metal GPGPU library for C++ on Windows.☆15Jan 29, 2017Updated 9 years ago
- ☆12Dec 9, 2015Updated 10 years ago
- UltraFast GPU Grammar eXtractor for Machine Translation (He et al., TACL 2015 & NAACL 2013)☆12Jun 19, 2015Updated 10 years ago
- Spoken Language Translation System☆14Jun 25, 2019Updated 6 years ago
- Deep learning model of machine translation using attentional and structural biases☆13Jul 21, 2017Updated 8 years ago
- Packaging utilities for GPL compression libraries in Hadoop☆34Jun 7, 2012Updated 13 years ago
- ☆14Aug 27, 2014Updated 11 years ago
- Unit benchmarks of CUDA event APIs.☆17Apr 23, 2024Updated last year
- API backend for EESAST☆13Updated this week
- Benchmark for popular fft libaries - fftw | cufftw | cufft☆18Dec 8, 2018Updated 7 years ago
- An Adaptor Grammar model implementation in Python.☆17Jan 31, 2020Updated 6 years ago
- Automata Benchmark Suite☆23Oct 23, 2023Updated 2 years ago
- A CUDA-C implementation of FOFE and FSMN☆19Aug 5, 2016Updated 9 years ago
- What You Say Is What You Did☆23Sep 24, 2019Updated 6 years ago
- FIPS 202 compliant SHA-3 core in Verilog☆23Oct 8, 2020Updated 5 years ago
- SpMV using CUDA☆20Mar 5, 2018Updated 8 years ago
- Microsoft Speech Language Translation (MSLT) Corpus☆19Sep 18, 2017Updated 8 years ago
- Multi-modal Bayesian embedding model☆18Jun 30, 2016Updated 9 years ago
- ☆18Oct 5, 2017Updated 8 years ago
- Memory consistency modelling using Alloy☆31Dec 16, 2020Updated 5 years ago
- This application shuffles the input file lines skipping (optionaly) the header. It's optimized for files bigger than available RAM.☆25Jan 9, 2017Updated 9 years ago
- The system call intercepting library☆23Sep 18, 2022Updated 3 years ago
- Code for the paper "Balancing Training for Multilingual Neural Machine Translation, ACL 2020"☆23May 26, 2021Updated 4 years ago
- CUDA Sparse-Matrix Vector Multiplication, using Sliced Coordinate format☆22Jun 8, 2018Updated 7 years ago
- ☆22Mar 27, 2022Updated 3 years ago