YuxueYang1204 / CudaDemoView external linksLinks
Implement custom operators in PyTorch with cuda/c++
☆76Jan 1, 2023Updated 3 years ago
Alternatives and similar repositories for CudaDemo
Users that are interested in CudaDemo are comparing it to the libraries listed below
Sorting:
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆42Feb 19, 2025Updated 11 months ago
- A library for parsing images in Mojo☆20Apr 14, 2025Updated 10 months ago
- ☆19Aug 20, 2025Updated 5 months ago
- ☆36Aug 25, 2023Updated 2 years ago
- ☆17Nov 14, 2023Updated 2 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Geometric Algebra☆24Nov 21, 2025Updated 2 months ago
- ☆30Feb 10, 2026Updated last week
- spark-sight: Spark performance at a glance☆10Apr 6, 2023Updated 2 years ago
- Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API,ChatGPT国内可用免费转发API,直连无需代理。☆13Aug 28, 2024Updated last year
- Asynchronous CUDA for Rust.☆37Sep 18, 2025Updated 4 months ago
- TensorRT encapsulation, learn, rewrite, practice.☆30Oct 19, 2022Updated 3 years ago
- The YOLOv10 C++ TensorRT Project in C++ and optimized using NVIDIA TensorRT☆36Oct 14, 2024Updated last year
- Several simple examples for popular neural network toolkits calling custom CUDA operators.☆1,526Apr 29, 2021Updated 4 years ago
- ☆52Jan 5, 2026Updated last month
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆9,666Updated this week
- ☆34Mar 29, 2023Updated 2 years ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆494Oct 28, 2025Updated 3 months ago
- 《CUDA编程基础与实践》一书的代码☆154Apr 28, 2022Updated 3 years ago
- A Simple and Efficient Automatic White Balance Algorithm Implementation In C☆10Mar 4, 2019Updated 6 years ago
- ☆17May 27, 2025Updated 8 months ago
- Deploy Yolo series algorithms on Hisilicon platform hi3516, including yolov3, yolov5, yolox, etc☆11Mar 25, 2022Updated 3 years ago
- 一款模仿Snipaste贴图软件的C#小项目☆14Dec 19, 2020Updated 5 years ago
- Is it difficult to develop C++ high-concurrency server applications? Come and use XServer☆10Jun 13, 2024Updated last year
- Delve is a debugger for the Go programming language.☆11Apr 9, 2023Updated 2 years ago
- A flexible utility for converting tensor precision in PyTorch models and safetensors files, enabling efficient deployment across various …☆11Aug 24, 2023Updated 2 years ago
- RISC-V emulator in Zig☆15Nov 4, 2023Updated 2 years ago
- ☆19Jan 16, 2026Updated last month
- 🧠🖼️🐍 A Python wrapper around the BrainFrame REST API☆12Jan 7, 2025Updated last year
- This is "ready from box" face recognition app, based on Mediapipe, dlib and face_recognition modules.☆11Dec 31, 2023Updated 2 years ago
- B站-数电的ppt☆11Feb 19, 2024Updated last year
- [Qt5开发及实例(第3版)][陆文周][程序源代码]☆10May 23, 2018Updated 7 years ago
- Implement some method of LLM KV Cache Sparsity☆41Jun 6, 2024Updated last year
- something for paper agent☆11Dec 18, 2024Updated last year
- C++ framework for computer vision inference, supporting multiple vision tasks and deep learning backends.☆93Jan 27, 2026Updated 3 weeks ago
- how to optimize some algorithm in cuda.☆2,819Updated this week
- 🚀 Simple and efficient use for Ultralytics yolov5🚀☆32Jan 17, 2023Updated 3 years ago
- Implementation of FlashAttention in PyTorch☆180Jan 12, 2025Updated last year
- YOLOv12 TensorRT 端到端 模型加速推理和INT8量化实现☆13Mar 5, 2025Updated 11 months ago