YuxueYang1204/CudaDemo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YuxueYang1204/CudaDemo)

YuxueYang1204 / CudaDemo

Implement custom operators in PyTorch with cuda/c++

☆77

Alternatives and similar repositories for CudaDemo

Users that are interested in CudaDemo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GinsengHoney / CUDA_Study
View on GitHub
☆18Jul 31, 2023Updated 2 years ago
EfficientLLMSys / MuxServe
View on GitHub
☆15Jun 26, 2024Updated 2 years ago
linxihui / dkernel
View on GitHub
☆22Apr 17, 2025Updated last year
JJXiangJiaoJun / cutlass_gemv
View on GitHub
GEMV implementation with CUTLASS
☆21Aug 21, 2025Updated 11 months ago
shouxieai / bevfusion_02hero
View on GitHub
☆17Nov 14, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
melonedo / algebraic-layouts
View on GitHub
☆23Aug 20, 2025Updated 11 months ago
xie-lab-ml / piecewise-sparse-attention
View on GitHub
Piecewise Sparse Attention Is Wiser for Efficient Diffusion Transformers
☆37Jul 1, 2026Updated 3 weeks ago
wangzyon / trt_learn
View on GitHub
TensorRT encapsulation, learn, rewrite, practice.
☆31Oct 19, 2022Updated 3 years ago
storrrrrrrrm / tensorrt_smoke
View on GitHub
3d object detection model smoke c++ inference code
☆39Dec 1, 2022Updated 3 years ago
OpenRL-Lab / PyTorch_Tutorial
View on GitHub
PyTorch使用技巧和教程
☆12Apr 17, 2023Updated 3 years ago
HaiPenglai / bilibili_DigitalCircuits
View on GitHub
B站-数电的ppt
☆11Feb 19, 2024Updated 2 years ago
nhynes / chisel3-axi
View on GitHub
Chisel3 AXI4-{Lite, Full, Stream} Definitions
☆15Dec 31, 2018Updated 7 years ago
MAhaitao999 / CUDA_Programming
View on GitHub
《CUDA编程基础与实践》一书的代码
☆172Apr 28, 2022Updated 4 years ago
JuniMay / llm.rs
View on GitHub
An attempt to migrate Karpathy's llm.c to safe rust.
☆13Jun 4, 2024Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
opencv-java / opencv
View on GitHub
Open Source Computer Vision Library
☆13Oct 22, 2015Updated 10 years ago
terminal-agent / reptile
View on GitHub
💻 Terminal-Agent with Human-in-the-Loop Learning
☆41Jan 16, 2026Updated 6 months ago
godweiyang / NN-CUDA-Example
View on GitHub
Several simple examples for popular neural network toolkits calling custom CUDA operators.
☆1,538Apr 29, 2021Updated 5 years ago
preacher-1 / MLA_tutorial
View on GitHub
from MHA, MQA, GQA to MLA by 苏剑林, with code
☆51Feb 19, 2025Updated last year
xgqdut2016 / cuda_code
View on GitHub
easy cuda code
☆101Dec 24, 2024Updated last year
DavidZWZ / LightGODE
View on GitHub
[CIKM 2024] Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation
☆15Aug 11, 2024Updated last year
Bruce-Lee-LY / matrix_multiply
View on GitHub
Several common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
☆14Feb 8, 2023Updated 3 years ago
KerfuffleV2 / ggml-sys-bleedingedge
View on GitHub
Bleeding edge low level Rust binding for GGML
☆18Jun 26, 2024Updated 2 years ago
suhipek / nku_os
View on GitHub
南开大学操作系统课程实验（UCore）
☆11Oct 16, 2022Updated 3 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
xgqdut2016 / hpc2torch
View on GitHub
☆40Jun 25, 2026Updated last month
daajoe / GPUSAT
View on GitHub
☆12Sep 29, 2021Updated 4 years ago
Ricardokevins / NJU_PA2019
View on GitHub
南京大学计算机科学与技术系2019 计算机系统基础PA
☆14Sep 18, 2020Updated 5 years ago
wiltchamberian / Zeta
View on GitHub
Zeta is a lightweight deep learning framework
☆24Apr 5, 2026Updated 3 months ago
NonvolatileMemory / flash_attn_gqa
View on GitHub
triton ver of gqa flash attn, based on the tutorial
☆12Aug 4, 2024Updated last year
xAlg-ai / HashAttention-1.0
View on GitHub
☆18Sep 23, 2025Updated 10 months ago
dsl-learn / cuda-magic
View on GitHub
fake CUTLASS to get peformance
☆26Apr 28, 2026Updated 2 months ago
IVL-PKU / easyHumanNeRF
View on GitHub
End-to-end realization of HumanNeRF
☆15Sep 5, 2023Updated 2 years ago
xuefeng-cvr / Tiny-Obstacle-Discovery-ROS
View on GitHub
Official Python/ROS Implementation for "A Novel Multi-layer Framework for Tiny Obstacle Discovery", ICRA 2019
☆14Jul 26, 2020Updated 6 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
RohanNagar / parallel-logic-networks
View on GitHub
Gate-Level Simulation on a GPU
☆10Nov 22, 2016Updated 9 years ago
HorizonRobotics / GLS
View on GitHub
Geometry-aware 3D Language Gaussian Splatting (arXiv 2024)
☆17Jan 7, 2026Updated 6 months ago
justinshenk / video-pose-extractor
View on GitHub
Dockerfile and instructions for human pose estimation implementation using Caffe, OpenCV 3.1.0 and Python 2.7.
☆12Mar 3, 2019Updated 7 years ago
Learning4Optimization-HUST / Pointerformer
View on GitHub
☆11Nov 29, 2022Updated 3 years ago
georgia-tech-synergy-lab / SparseAccelerator-RTL
View on GitHub
Accelerator RTL inspired by VEGETA [HPCA'23] and MicroScopiQ [ISCA'25]
☆15Nov 11, 2025Updated 8 months ago
tpoisonooo / how-to-optimize-gemm
View on GitHub
row-major matmul optimization
☆743May 14, 2026Updated 2 months ago
UNITES-Lab / Occult
View on GitHub
[ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…
☆13Apr 17, 2025Updated last year