tongzhou80/nanoPyC

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/tongzhou80/nanoPyC)

tongzhou80 / nanoPyC

☆69

Alternatives and similar repositories for nanoPyC

Users that are interested in nanoPyC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

lixiuhong / batched_gemm
View on GitHub
☆40Feb 28, 2020Updated 6 years ago
zeroine / cutlass-cute-sample
View on GitHub
☆49Apr 15, 2024Updated 2 years ago
cherichy / tilecute
View on GitHub
☆32Jul 2, 2025Updated last year
openmlsys / openmlsys-cuda
View on GitHub
Tutorials for writing high-performance GPU operators in AI frameworks.
☆135Aug 12, 2023Updated 2 years ago
QianyanTech / NBAssembler
View on GitHub
Assembler and Decompiler for NVIDIA (Maxwell Pascal Volta Turing Ampere) GPUs.
☆96Feb 23, 2023Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
caijixueIT / CUDA_Learning_for_Freshman
View on GitHub
☆14Nov 3, 2025Updated 8 months ago
Kedreamix / pytorch-cppcuda-tutorial
View on GitHub
tutorial for writing custom pytorch cpp+cuda kernel, applied on volume rendering (NeRF)
☆29Dec 12, 2023Updated 2 years ago
Liu-xiandong / How_to_optimize_in_GPU
View on GitHub
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several…
☆1,332Jul 29, 2023Updated 2 years ago
TiledTensor / TiledBench
View on GitHub
Benchmark tests supporting the TiledCUDA library.
☆19Nov 19, 2024Updated last year
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆743May 14, 2026Updated 2 months ago
BBuf / how-to-optim-algorithm-in-cuda
View on GitHub
how to optimize some algorithm in cuda.
☆3,146Updated this week
iclementine / optimize_softmax
View on GitHub
Optimize softmax in triton in many cases
☆24Sep 6, 2024Updated last year
LeiWang1999 / TVM.CMakeExtend
View on GitHub
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆16Oct 11, 2024Updated last year
mlc-ai / mlc-zh
View on GitHub
☆635Apr 5, 2026Updated 3 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
lucifer1004 / VeloQ
View on GitHub
Agent-friendly GPU profile-query CLI
☆106Jun 22, 2026Updated last month
ics-nju-wl / icspa-public
View on GitHub
ICSPA for MOOC
☆53Jun 12, 2023Updated 3 years ago
Triang-jyed-driung / i8muon
View on GitHub
Muon in Int8 Precision Made Possible
☆20Jun 18, 2026Updated last month
flame / how-to-optimize-gemm
View on GitHub
☆2,022Jul 29, 2023Updated 2 years ago
BBuf / tvm_mlir_learn
View on GitHub
compiler learning resources collect.
☆2,758May 20, 2026Updated 2 months ago
MegEngine / MegPeak
View on GitHub
☆256Sep 15, 2023Updated 2 years ago
MARD1NO / CUDA-PPT
View on GitHub
☆136Apr 16, 2026Updated 3 months ago
MegEngine / MegCC
View on GitHub
MegCC是一个运行时超轻量，高效，移植简单的深度学习模型编译器
☆483Oct 23, 2024Updated last year
StrongSpoon / tvm.schedule
View on GitHub
examples for tvm schedule API
☆101Jun 12, 2023Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
array2d / deepx
View on GitHub
Large-scale Auto-Distributed Training/Inference Unified Framework | Memory-Compute-Control Decoupled Architecture | Multi-language SDK & …
☆54Jul 17, 2026Updated last week
wangsiping97 / GPU-Tutorials
View on GitHub
Tutorials to GPU programming. Reading notes.
☆19Apr 27, 2023Updated 3 years ago
ChenCVer / python_cpp_extension
View on GitHub
C++ and CUDA extensions for Python/Pytorch and GPU Accelerated Augmentation.
☆34Nov 30, 2022Updated 3 years ago
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
KnowingNothing / MatmulTutorial
View on GitHub
A Easy-to-understand TensorOp Matmul Tutorial
☆445Mar 5, 2026Updated 4 months ago
Yinghan-Li / YHs_Sample
View on GitHub
Yinghan's Code Sample
☆365Jul 25, 2022Updated 3 years ago
Syencil / Programming_Massively_Parallel_Processors
View on GitHub
CUDA 6大并行计算模式代码与笔记
☆63Jul 30, 2020Updated 5 years ago
merrymercy / awesome-tensor-compilers
View on GitHub
A list of awesome compiler projects and papers for tensor computation and deep learning.
☆2,768Oct 19, 2024Updated last year
torchpipe / torchpipe
View on GitHub
Serving Inside Pytorch
☆169Jun 23, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
KuangjuX / cuda-evolve-oss
View on GitHub
Autonomous GPU kernel optimization system driven by AI agents.
☆31Mar 29, 2026Updated 3 months ago
yinuotxie / Efficient-LLM-Inferencing-on-GPUs
View on GitHub
Penn CIS 5650 (GPU Programming and Architecture) Final Project
☆45Dec 11, 2023Updated 2 years ago
uwsampl / sparsetir-artifact
View on GitHub
Repository for artifact evaluation of ASPLOS 2023 paper "SparseTIR: Composable Abstractions for Sparse Compilation in Deep Learning"
☆25Feb 24, 2023Updated 3 years ago
openmlsys / openmlsys
View on GitHub
《Machine Learning Systems: Design and Implementation》 (V2 is launching soon）
☆4,825Mar 15, 2026Updated 4 months ago
parallel101 / hw01
View on GitHub
高性能并行编程与优化 - 第01讲回家作业
☆27Aug 12, 2024Updated last year
habanero-lab / APPy
View on GitHub
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆29Mar 22, 2026Updated 4 months ago
Oneflow-Inc / DLPerf
View on GitHub
DeepLearning Framework Performance Profiling Toolkit
☆292Mar 28, 2022Updated 4 years ago