Implement custom operators in PyTorch with cuda/c++
☆76Jan 1, 2023Updated 3 years ago
Alternatives and similar repositories for CudaDemo
Users that are interested in CudaDemo are comparing it to the libraries listed below
Sorting:
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆43Feb 19, 2025Updated last year
- A library for parsing images in Mojo☆20Apr 14, 2025Updated 10 months ago
- ☆20Aug 20, 2025Updated 6 months ago
- Bleeding edge low level Rust binding for GGML☆16Jun 26, 2024Updated last year
- ☆36Aug 25, 2023Updated 2 years ago
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆20Jan 24, 2025Updated last year
- Official implementation of ICCV25 paper "Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing"☆32Sep 9, 2025Updated 6 months ago
- easy cuda code☆96Dec 24, 2024Updated last year
- Geometric Algebra☆24Nov 21, 2025Updated 3 months ago
- ☆21Apr 17, 2025Updated 10 months ago
- ☆31Feb 25, 2026Updated 2 weeks ago
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆507Oct 28, 2025Updated 4 months ago
- Several simple examples for popular neural network toolkits calling custom CUDA operators.☆1,527Apr 29, 2021Updated 4 years ago
- spark-sight: Spark performance at a glance☆10Apr 6, 2023Updated 2 years ago
- 推荐系统入门教程,包含基础知识和相应的运行实例☆10Jan 9, 2024Updated 2 years ago
- TensorRT encapsulation, learn, rewrite, practice.☆30Oct 19, 2022Updated 3 years ago
- The YOLOv10 C++ TensorRT Project in C++ and optimized using NVIDIA TensorRT☆36Oct 14, 2024Updated last year
- ☆74Jan 25, 2025Updated last year
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆9,815Feb 25, 2026Updated last week
- Collection of latest papers and materials in the area of RLVR!☆65Mar 2, 2026Updated last week
- C# 基于.NET5开发的WPF串口助手☆13Mar 21, 2022Updated 3 years ago
- how to optimize some algorithm in cuda.☆2,841Feb 28, 2026Updated last week
- flash attention tutorial written in python, triton, cuda, cutlass☆490Jan 20, 2026Updated last month
- 一款模仿Snipaste贴图软件的C#小项目☆14Dec 19, 2020Updated 5 years ago
- B站-数电的ppt☆11Feb 19, 2024Updated 2 years ago
- A codebase for data crawling and preprocessing for TTS and ASR systems training.☆22Feb 26, 2026Updated last week
- Implement some method of LLM KV Cache Sparsity☆40Jun 6, 2024Updated last year
- 2024维护(复刻)版本的yolov5+deepsort目标检测和追踪,能显示目标类别,能训练自己数据集.包含了一部分测试视频供常识,提供了txt和json两种格式的识别输出方式.可用于识别项目,路面识别,智能交通,毕设等各种.☆10Feb 28, 2024Updated 2 years ago
- Deploy Yolo series algorithms on Hisilicon platform hi3516, including yolov3, yolov5, yolox, etc☆11Mar 25, 2022Updated 3 years ago
- 网络学习笔记,同步看板 https://github.com/orgs/apachecn/teams/diaosi☆11Jan 7, 2020Updated 6 years ago
- something for paper agent☆11Dec 18, 2024Updated last year
- ☆24Nov 21, 2025Updated 3 months ago
- 🧠🖼️🐍 A Python wrapper around the BrainFrame REST API☆12Jan 7, 2025Updated last year
- [Qt5开发及实例(第3版)][陆文周][程序源代码]☆10May 23, 2018Updated 7 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 5 months ago
- Notes and Examples to get started Parallel Computing with CUDA.☆13Nov 1, 2019Updated 6 years ago
- LLM-DSE: Searching Accelerator Parameters with LLM Agents☆13May 22, 2025Updated 9 months ago
- An object detection model for NMNIST larger video frame☆12Feb 24, 2022Updated 4 years ago
- 大规模并行处理器编程实战 第二版答案☆34Jun 4, 2022Updated 3 years ago