Implement custom operators in PyTorch with cuda/c++
☆77Jan 1, 2023Updated 3 years ago
Alternatives and similar repositories for CudaDemo
Users that are interested in CudaDemo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆21Aug 20, 2025Updated 7 months ago
- from MHA, MQA, GQA to MLA by 苏剑林, with code☆46Feb 19, 2025Updated last year
- ☆15Jun 26, 2024Updated last year
- ☆22Apr 17, 2025Updated 11 months ago
- Official implementation of ICCV25 paper "Trace3D: Consistent Segmentation Lifting via Gaussian Instance Tracing"☆35Sep 9, 2025Updated 7 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- High Performance FP8 GEMM Kernels for SM89 and later GPUs.☆21Jan 24, 2025Updated last year
- TensorRT encapsulation, learn, rewrite, practice.☆29Oct 19, 2022Updated 3 years ago
- PyTorch使用技巧和教程☆12Apr 17, 2023Updated 2 years ago
- The code of ICLR 2024 paper: Boosting the Adversarial Robustness of Graph Neural Networks: An OOD Perspective☆14May 30, 2024Updated last year
- Learning problem-solving, logic/set, math, physics, economics through functional programming using Haskell☆19Oct 16, 2015Updated 10 years ago
- Chisel3 AXI4-{Lite, Full, Stream} Definitions☆15Dec 31, 2018Updated 7 years ago
- An attempt to migrate Karpathy's llm.c to safe rust.☆13Jun 4, 2024Updated last year
- 《CUDA编程基础与实践》一书的代码☆162Apr 28, 2022Updated 3 years ago
- Several simple examples for popular neural network toolkits calling custom CUDA operators.☆1,529Apr 29, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Open Source Computer Vision Library☆13Oct 22, 2015Updated 10 years ago
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆10,217Updated this week
- 南开大学操作系统课程实验(UCore)☆11Oct 16, 2022Updated 3 years ago
- 南京大学ICS2019 PA实验, 实验手册https://nju-projectn.github.io/ics-pa-gitbook/ics2019/☆10Aug 22, 2020Updated 5 years ago
- A library for parsing images in Mojo☆20Apr 14, 2025Updated 11 months ago
- MicroMix: Efficient Mixed-Precision Quantization with Microscaling Formats for Large Language Models☆28Apr 2, 2026Updated last week
- ☆11Aug 8, 2018Updated 7 years ago
- 南京大学 计算机科学与技术系2019 计算机系统基础PA☆14Sep 18, 2020Updated 5 years ago
- Try to reproduce FreeNeRF☆14May 6, 2023Updated 2 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- A package for pedestrian detection, tracking, and re-identification.☆13Feb 28, 2021Updated 5 years ago
- Official repository for the paper Local Linear Attention: An Optimal Interpolation of Linear and Softmax Attention For Test-Time Regressi…☆23Oct 1, 2025Updated 6 months ago
- Source codes and Datasets for EDDA in CIKM'23☆21Aug 8, 2023Updated 2 years ago
- [NAACL'25 🏆 SAC Award] Official code for "Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert…☆16Feb 4, 2025Updated last year
- ☆12Nov 29, 2022Updated 3 years ago
- row-major matmul optimization☆713Feb 24, 2026Updated last month
- [ICML‘25] Official code for paper "Occult: Optimizing Collaborative Communication across Experts for Accelerated Parallel MoE Training an…☆13Apr 17, 2025Updated 11 months ago
- ☆14Sep 13, 2022Updated 3 years ago
- how to optimize some algorithm in cuda.☆2,910Apr 1, 2026Updated last week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Is it difficult to develop C++ high-concurrency server applications? Come and use XServer☆10Jun 13, 2024Updated last year
- Official Python/ROS Implementation for "A Novel Multi-layer Framework for Tiny Obstacle Discovery", ICRA 2019☆14Jul 26, 2020Updated 5 years ago
- GPU-accelerated Ant Colony Optimization (ACO)☆17Feb 28, 2025Updated last year
- A minimal GUI for 3DGS using DearPyGUI framework☆14May 15, 2025Updated 10 months ago
- ☆2,724Jan 16, 2024Updated 2 years ago
- implementation of our IJCAI'24 paper "Cross-Problem Learning for Solving Vehicle Routing Problems".☆20Aug 17, 2024Updated last year
- 校招、秋招、春招、实习好项目,带你从零动手实现支持LLama2/3和Qwen2.5的大模型推理框架。☆528Oct 28, 2025Updated 5 months ago