Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
☆15Feb 8, 2023Updated 3 years ago
Alternatives and similar repositories for matrix_multiply
Users that are interested in matrix_multiply are comparing it to the libraries listed below
Sorting:
- Exposes batch message receives (recvmmsg)☆14Aug 15, 2025Updated 6 months ago
- Conditional Linear Dynamical Systems☆15Oct 7, 2025Updated 5 months ago
- Dockerfile for building remix-ide docker image☆10Jan 17, 2020Updated 6 years ago
- A Zen approach to configuring your Python project☆15Feb 27, 2026Updated last week
- codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"☆10Dec 30, 2024Updated last year
- Tool to display/decode CPUINFO☆10Oct 22, 2018Updated 7 years ago
- Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.☆13Nov 3, 2023Updated 2 years ago
- Principles and Methodologies for Serial Performance Optimization (OSDI' 25)☆25Jun 5, 2025Updated 9 months ago
- This repo is "NTHU Parallel Programing" course project.☆10Dec 5, 2017Updated 8 years ago
- Don't just regulate gradients like in Muon, regulate the weights too☆31Jul 30, 2025Updated 7 months ago
- NTHU CS6135 VLSI實體設計自動化☆12Mar 12, 2022Updated 3 years ago
- 🌟✨一个纯粹基于requests的Python爬虫工具,专为获取拼多多商品分类和详情页面而设计!🛒🎉 给繁琐的自动化浏览器代码说再见👋,用这个轻量级🎈、高效🚀的工具轻松获取你需要的信息。📚🌈☆12Aug 31, 2023Updated 2 years ago
- iADMM for a low-rank representation optimization problem☆13Feb 5, 2021Updated 5 years ago
- Research & Development for Golem project☆21Dec 10, 2018Updated 7 years ago
- 2020级课程设计DPLL算法解决SAT问题☆12Nov 3, 2021Updated 4 years ago
- ☆14Dec 13, 2023Updated 2 years ago
- The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization☆18Mar 7, 2025Updated last year
- C++ implement a simple CNN framework to train mnist data. Done!☆10Mar 29, 2022Updated 3 years ago
- Implementation of Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems☆14Nov 11, 2023Updated 2 years ago
- ☆12Sep 16, 2024Updated last year
- ☆15Jul 13, 2025Updated 7 months ago
- An ATPG tool using PODEM algorithm in C++ that generates a test to detect any given list of Single-Stuck-at Faults☆11Oct 29, 2017Updated 8 years ago
- Avrio's core code written in rust.☆17Sep 12, 2022Updated 3 years ago
- A High performance and tiny TVM graph executor library written in C which can compile to WebAssembly and use CUDA/WebGPU as the accelerat…☆12Aug 3, 2023Updated 2 years ago
- ☆12Mar 19, 2021Updated 4 years ago
- Find context neurons in Pythia models.☆13Jun 13, 2023Updated 2 years ago
- ☆15Jan 26, 2026Updated last month
- Static timing analysis (STA) is a method of validating the timing performance of a design by checking all possible paths for timing viola…☆16Oct 4, 2022Updated 3 years ago
- ☆10Jul 23, 2023Updated 2 years ago
- Command-line script to access global proxy via PKU VPN☆14Sep 10, 2022Updated 3 years ago
- Ring-Signature using secp256k1 in Solidity☆13Jul 6, 2018Updated 7 years ago
- ☆12Jan 17, 2024Updated 2 years ago
- Unofficial Scalable-Softmax Is Superior for Attention☆20May 30, 2025Updated 9 months ago
- RPG^2 is a pure-software system that operates on running C/C++ programs, profiling them, injecting prefetch instructions, and then tuning …☆12May 15, 2024Updated last year
- Semi-Tenser Product based SAT and AllSAT solver, where it can solve CNF and circuit input.☆17Aug 2, 2023Updated 2 years ago
- ☆22Feb 11, 2024Updated 2 years ago
- NTHU CS5422 Parallel Programming Course Projects (include Odd-Even Sort, Mandelbrot Set, All-Pairs Shortest Path, Blocked All-Pairs Short…☆13Sep 7, 2025Updated 6 months ago
- A slurm solution for Crusoe Cloud☆13Updated this week
- ☆14Jul 6, 2021Updated 4 years ago