openmlsys/openmlsys-cuda

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/openmlsys/openmlsys-cuda)

openmlsys / openmlsys-cuda

Tutorials for writing high-performance GPU operators in AI frameworks.

☆135

Alternatives and similar repositories for openmlsys-cuda

Users that are interested in openmlsys-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

AlexwellChen / Toy_ML_Framework
View on GitHub
☆11May 16, 2026Updated 2 months ago
openmlsys / openmlsys
View on GitHub
《Machine Learning Systems: Design and Implementation》 (V2 is launching soon）
☆4,824Mar 15, 2026Updated 4 months ago
richjjj / cuvid-tensorrt-multi
View on GitHub
ffmpeg+cuvid+tensorrt+multicamera
☆12Dec 31, 2024Updated last year
li199603 / sgemm_with_cuda
View on GitHub
SGEMM optimization with cuda step by step
☆23Mar 23, 2024Updated 2 years ago
Bruce-Lee-LY / flash_attention_inference
View on GitHub
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
☆45Feb 27, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
reed-lau / cute-gemm
View on GitHub
☆186May 11, 2026Updated 2 months ago
BBuf / tvm_mlir_learn
View on GitHub
compiler learning resources collect.
☆2,759May 20, 2026Updated 2 months ago
Qingrenn / mmdeploy-summer-camp
View on GitHub
🐱 ncnn int8 模型量化评估
☆14Oct 10, 2022Updated 3 years ago
matrix97317 / OneNeuralNetwork
View on GitHub
This is a cross-chip platform collection of operators and a unified neural network library.
☆17Nov 3, 2023Updated 2 years ago
KuangjuX / cu-x
View on GitHub
🎉My Collections of CUDA Kernels~
☆11Jun 25, 2024Updated 2 years ago
tongzhou80 / nanoPyC
View on GitHub
☆69Mar 19, 2023Updated 3 years ago
BBuf / how-to-learn-deep-learning-framework
View on GitHub
how to learn PyTorch and OneFlow
☆502May 20, 2026Updated 2 months ago
MARD1NO / CUDA-PPT
View on GitHub
☆136Apr 16, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
MegEngine / mperf
View on GitHub
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆196Aug 17, 2023Updated 2 years ago
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,142Updated this week
flashinfer-ai / cutlass-viz
View on GitHub
☆65Apr 26, 2025Updated last year
Bruce-Lee-LY / cuda_hgemm
View on GitHub
Several optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruct…
☆556Sep 8, 2024Updated last year
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,329Jul 29, 2023Updated 2 years ago
MegEngine / MegCC
View on GitHub
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
☆482Oct 23, 2024Updated last year
weishengying / cutlass_flash_atten_fp8
View on GitHub
使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention
☆82Aug 12, 2024Updated last year
lenLRX / AmpereSparseMatmul
View on GitHub
study of Ampere' Sparse Matmul
☆18Jan 10, 2021Updated 5 years ago
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated last month
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
OpenPPL / ppl.llm.kernel.cuda
View on GitHub
☆150Jan 9, 2025Updated last year
XiaoSongXS / dgemm-knl
View on GitHub
DGEMM on KNL, achieve 75% MKL
☆18May 19, 2022Updated 4 years ago
DD-DuDa / Cute-Learning
View on GitHub
Examples of CUDA implementations by Cutlass CuTe
☆279Jul 1, 2025Updated last year
richjjj / duscratch
View on GitHub
搜藏的希望的代码片段
☆13Jun 6, 2023Updated 3 years ago
zjhellofss / KuiperInfer
View on GitHub
校招、秋招、春招、实习好项目！带你从零实现一个高性能的深度学习推理库，支持大模型 llama2 、Unet、Yolov5、Resnet等模型的推理。Implement a high-performance deep learning inference library st…
☆3,462Jun 22, 2025Updated last year
masahi / tvm-cutlass-eval
View on GitHub
☆41Mar 31, 2022Updated 4 years ago
66RING / tiny-flash-attention
View on GitHub
flash attention tutorial written in python, triton, cuda, cutlass
☆527Jan 20, 2026Updated 6 months ago
lucasjinreal / wnnx_models
View on GitHub
Various test models in WNNX format. It can view with `pip install wnetron && wnetron`
☆12Jun 22, 2022Updated 4 years ago
TRT2022 / ControlNet_TensorRT
View on GitHub
天池 NVIDIA TensorRT Hackathon 2023 —— 生成式AI模型优化赛初赛第三名方案
☆50Aug 16, 2023Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
Bruce-Lee-LY / cutlass_gemm
View on GitHub
Multiple GEMM operators are constructed with cutlass to support LLM inference.
☆20Aug 3, 2025Updated 11 months ago
mlc-ai / notebooks
View on GitHub
☆228Nov 22, 2024Updated last year
RussWong / CUDATutorial
View on GitHub
A CUDA tutorial to make people learn CUDA program from 0
☆279Jul 9, 2024Updated 2 years ago
Oneflow-Inc / oneflow-xrt
View on GitHub
☆24Apr 25, 2023Updated 3 years ago
Jack47 / hack-SysML
View on GitHub
The road to hack SysML and become an system expert
☆516Sep 25, 2024Updated last year
Cjkkkk / CUDA_gemm
View on GitHub
A simple high performance CUDA GEMM implementation.
☆437Jan 4, 2024Updated 2 years ago