Bruce-Lee-LY/matrix_multiply

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Bruce-Lee-LY/matrix_multiply)

Bruce-Lee-LY / matrix_multiply

Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.

☆14

Alternatives and similar repositories for matrix_multiply

Users that are interested in matrix_multiply are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AyakaGEMM / Hands-on-GEMM
View on GitHub
☆156Mar 18, 2024Updated 2 years ago
Bruce-Lee-LY / cuda_back2back_hgemm
View on GitHub
Use tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
☆13Nov 3, 2023Updated 2 years ago
mayank31398 / ladder-residual-inference
View on GitHub
☆14Jul 13, 2025Updated last year
thomasahle / cce
View on GitHub
Clustered Compositional Embeddings
☆13Oct 25, 2023Updated 2 years ago
inria-thoth / csa
View on GitHub
Official Pytorch implementation of Chromatic Graph Transformers
☆10Jun 14, 2023Updated 3 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
brando90 / ultimate-anatome
View on GitHub
Ἀνατομή is a PyTorch library to analyze representation of neural networks
☆13Jan 31, 2024Updated 2 years ago
GuoTianYu2000 / Active-Dormant-Attention
View on GitHub
codes and plots for "Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs"
☆11Dec 30, 2024Updated last year
wtong98 / mlp-icl
View on GitHub
☆12Sep 16, 2024Updated last year
hummingtree / cuda-graph-with-dynamic-parameters
View on GitHub
☆17Aug 9, 2022Updated 3 years ago
laplaceyc / Parallel-Programing
View on GitHub
This repo is "NTHU Parallel Programing" course project.
☆10Dec 5, 2017Updated 8 years ago
nhatpd / iADMM
View on GitHub
iADMM for a low-rank representation optimization problem
☆13Feb 5, 2021Updated 5 years ago
Ratbuyer / h100-features
View on GitHub
☆18Mar 12, 2025Updated last year
airjerry1216 / VLSI-Physical-Design-Automation
View on GitHub
NTHU CS6135 VLSI實體設計自動化
☆11Mar 12, 2022Updated 4 years ago
umich-sota / TF-as-SVM
View on GitHub
☆12Jan 17, 2024Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
wesg52 / llm-context-neurons
View on GitHub
Find context neurons in Pythia models.
☆13Jun 13, 2023Updated 3 years ago
james-oldfield / MxD
View on GitHub
[NeurIPS'25] Towards Interpretability Without Sacrifice: Faithful Dense Layer Decomposition with Mixture of Decoders
☆16May 28, 2025Updated last year
iamsubhranil / ThreadPool
View on GitHub
A fast, small, efficient pthreads based threadpool in c
☆16Mar 2, 2021Updated 5 years ago
BryceZhuo / HybridNorm
View on GitHub
The official implementation of HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
☆19Mar 7, 2025Updated last year
lygyue / SimpleDeepLearningFramework
View on GitHub
C++ implement a simple CNN framework to train mnist data. Done!
☆10Mar 29, 2022Updated 4 years ago
ledmaster / unified-embeddings
View on GitHub
Implementation of Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems
☆15Nov 11, 2023Updated 2 years ago
borjanG / 2023-transformers
View on GitHub
Codes for the paper The emergence of clusters in self-attention dynamics.
☆17Dec 18, 2023Updated 2 years ago
theNefelibata / cpp_smart_ptr
View on GitHub
一步步实现c++中的智能指针
☆10Jun 6, 2021Updated 5 years ago
HobbitQia / 2023_ICS
View on GitHub
☆10Jul 23, 2023Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
alexzhang13 / Triton-Puzzles-Solutions
View on GitHub
Personal solutions to the Triton Puzzles
☆21Jul 18, 2024Updated 2 years ago
panhomyoung / phySAT
View on GitHub
Semi-Tenser Product based SAT and AllSAT solver, where it can solve CNF and circuit input.
☆17Aug 2, 2023Updated 2 years ago
anda522 / ThreadPool
View on GitHub
基于C++17实现的简易线程池（附代码解释和知识介绍）
☆12Apr 14, 2023Updated 3 years ago
janzaremski / playwright-har
View on GitHub
☆14Dec 13, 2023Updated 2 years ago
zhshi0816 / GDConvNet
View on GitHub
☆15Jan 25, 2021Updated 5 years ago
activatedgeek / tight-pac-bayes
View on GitHub
Code for PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization, NeurIPS 2022
☆18Nov 23, 2022Updated 3 years ago
crusoecloud / slurm
View on GitHub
A slurm solution for Crusoe Cloud
☆15Jun 21, 2026Updated last month
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆558Sep 8, 2024Updated last year
llmsresearch / scone
View on GitHub
Implementation and evaluation of Scaling Embedding Layers in Language Models research paper
☆15Feb 2, 2026Updated 5 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
simonucl / PolySkill
View on GitHub
Official implementation of PolySkill, a framework that enables web agents to learn generalizable and compositional skills through polymor…
☆15Jul 6, 2026Updated 3 weeks ago
sanketkkeni / PODEM-Algorithm-implementation
View on GitHub
An ATPG tool using PODEM algorithm in C++ that generates a test to detect any given list of Single-Stuck-at Faults
☆11Oct 29, 2017Updated 8 years ago
aradha / deep_neural_feature_ansatz
View on GitHub
Code for verifying deep neural feature ansatz
☆22May 3, 2023Updated 3 years ago
yanghaku / tvm-rt-wasm
View on GitHub
A High performance and tiny TVM graph executor library written in C which can compile to WebAssembly and use CUDA/WebGPU as the accelerat…
☆13Aug 3, 2023Updated 2 years ago
lciernik / similarity_consistency
View on GitHub
Representational similarity consistency across dataset and their driving factors.
☆16Jun 20, 2025Updated last year
benjamin-recht / mnist_1_pt_2
View on GitHub
1.2% test error on MNIST using only least squares and numpy calls.
☆22Sep 13, 2023Updated 2 years ago
JeanKossaifi / zencfg
View on GitHub
A Zen approach to configuring your Python project
☆17Feb 27, 2026Updated 5 months ago