KuangjuX/cu-x

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/KuangjuX/cu-x)

KuangjuX / cu-x

🎉My Collections of CUDA Kernels~

☆11

Alternatives and similar repositories for cu-x

Users that are interested in cu-x are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

TiledTensor / TiledKernel
View on GitHub
TiledKernel is a code generation library based on macro kernels and memory hierarchy graph data structure.
☆19May 12, 2024Updated 2 years ago
luliyucoordinate / flash-attention-minimal
View on GitHub
Flash Attention in ~100 lines of CUDA (forward pass only)
☆12Jun 10, 2024Updated 2 years ago
HanGuo97 / hilt
View on GitHub
☆40Dec 14, 2025Updated 7 months ago
YangLinzhuo / cuda-sgemm-optimization
View on GitHub
CUDA SGEMM optimization note
☆15Oct 31, 2023Updated 2 years ago
YdrMaster / dtb-walker
View on GitHub
遍历设备树二进制对象
☆16Jun 20, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
CircuitCoder / ChannelOS
View on GitHub
What if everything is a io_uring?
☆17Nov 10, 2022Updated 3 years ago
YdrMaster / llama2.rs
View on GitHub
实验：rust 实现 llama2 推理
☆17Feb 23, 2024Updated 2 years ago
stemnic / rustyvisor
View on GitHub
Hypervisor written in Rust for the RISC-V 1.0 hypervisor extension
☆16Oct 21, 2024Updated last year
KuangjuX / TileGraph
View on GitHub
TileGraph is an experimental DNN compiler that utilizes static code generation and kernel fusion techniques.
☆11Sep 18, 2024Updated last year
TiledTensor / TiledLower
View on GitHub
TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.
☆13Nov 23, 2024Updated last year
TiledTensor / TiledCUDA
View on GitHub
We invite you to visit and follow our new repository at https://github.com/microsoft/TileFusion. TiledCUDA is a highly efficient kernel …
☆192Jan 28, 2025Updated last year
latentCall145 / channels-last-groupnorm
View on GitHub
A CUDA kernel for NHWC GroupNorm for PyTorch
☆23Nov 15, 2024Updated last year
jwnhy / coffer
View on GitHub
Coffer is a RISC-V trusted execution environment developed in Rust.
☆21Mar 3, 2022Updated 4 years ago
LeiWang1999 / TVM.CMakeExtend
View on GitHub
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆16Oct 11, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
luojia65 / zihai
View on GitHub
自嗨虚拟化软件 - 'Enjoy yourself' type-1 hypervisor software
☆25Apr 21, 2022Updated 4 years ago
YdrMaster / cuda-driver
View on GitHub
基于 CUDA Driver API 的 cuda 运行时环境
☆16Jul 30, 2025Updated 11 months ago
rustsbi / serde-device-tree
View on GitHub
Serialize & deserialize device tree binary using serde
☆23Dec 4, 2025Updated 7 months ago
JiangLiSJTU / token-ring
View on GitHub
☆13Jan 7, 2025Updated last year
microsoft / cusync
View on GitHub
☆27Feb 20, 2024Updated 2 years ago
AlexwellChen / Toy_ML_Framework
View on GitHub
☆11May 16, 2026Updated 2 months ago
muriloboratto / NVSHEMEM
View on GitHub
Sample Codes using NVSHMEM on Multi-GPU
☆30Jan 22, 2023Updated 3 years ago
StudyingLover / ggml-tutorial
View on GitHub
☆34Sep 8, 2024Updated last year
pointpillars-on-openvino / pointpillars-on-openvino
View on GitHub
☆12Dec 16, 2021Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
OS-F-4 / usr-intr
View on GitHub
项目的主仓库
☆26Sep 11, 2022Updated 3 years ago
hxdoit / lerobot
View on GitHub
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
☆33Feb 19, 2026Updated 4 months ago
s-sd / task-amenability
View on GitHub
☆18May 10, 2023Updated 3 years ago
iimmortall / QuantLib
View on GitHub
☆14Feb 3, 2022Updated 4 years ago
InfiniTensor / gguf
View on GitHub
handle gguf files
☆14Aug 14, 2025Updated 11 months ago
timothee-haudebourg / btree-range-map
View on GitHub
B-tree range map implementation for Rust
☆13Oct 5, 2023Updated 2 years ago
reed-lau / cute-gemm
View on GitHub
☆185May 11, 2026Updated 2 months ago
sophgo / libsophon
View on GitHub
Sophgo AI chips driver and runtime library.
☆25Jun 30, 2026Updated 2 weeks ago
tlc-pack / cutlass_fpA_intB_gemm
View on GitHub
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆96Jun 21, 2026Updated 3 weeks ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
MARD1NO / CUDA-PPT
View on GitHub
☆136Apr 16, 2026Updated 3 months ago
sgl-project / DeepGEMM
View on GitHub
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
☆32Updated this week
yiakwy-xpu-ml-framework-team / flash-float-jit-kernels
View on GitHub
☆23Updated this week
mayankagarwals / MLSys-FlashLinfer-Contest
View on GitHub
☆48Updated this week
rchardx / hopper-gemm
View on GitHub
☆48Nov 1, 2025Updated 8 months ago
oliverhu / rama
View on GitHub
llama2 inference engine in Rust
☆13Apr 12, 2024Updated 2 years ago
pzhao-eng / FlashMLA
View on GitHub
☆66Feb 15, 2026Updated 5 months ago