🎉CUDA 笔记 / 高频面试题汇总 / C++笔记,个人笔记,更新随缘: sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
☆51Feb 23, 2024Updated 2 years ago
Alternatives and similar repositories for CUDA-Learn-Note
Users that are interested in CUDA-Learn-Note are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Low-Overhead tool for Floating-Point Exception Detection in NVIDIA GPUs☆13Dec 17, 2024Updated last year
- ☆11Oct 14, 2023Updated 2 years ago
- ☆23Aug 14, 2024Updated last year
- Code for paper: Latent-space Dynamics for Reduced Deformable Simulation☆38May 29, 2019Updated 6 years ago
- A 3D fluid simulation on the GPU using C++ and Vulkan.☆13Jun 12, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- 🐱 ncnn int8 模型量化评估☆14Oct 10, 2022Updated 3 years ago
- ☆17May 4, 2017Updated 9 years ago
- Adaptive Topology Reconstruction for Robust Graph Representation Learning [Efficient ML Model]☆10Feb 11, 2025Updated last year
- ☆12Apr 16, 2024Updated 2 years ago
- GEMM by WMMA (tensor core)☆15Jul 31, 2022Updated 3 years ago
- 📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉☆11,050May 17, 2026Updated last week
- A project to quickly detect discrepancies in floating point computation across hardware, compilers, libraries and software.☆39Nov 14, 2024Updated last year
- 🚀全流程自己训练一个VLA 「大模型」1小时从0训练26M参数的视觉多模态VLM!🌏 Train a 26M-parameter VLM from scratch in just 1 hours!☆33Oct 16, 2025Updated 7 months ago
- An implementation of "Air Meshes for Robust Collision Handling", SIGGRAPH (2015)☆16Aug 30, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- NeurIPS 2020 Spotlight Paper☆13Dec 20, 2021Updated 4 years ago
- Implementation of Speculative Sampling as described in "Accelerating Large Language Model Decoding with Speculative Sampling" by Deepmind☆111Feb 29, 2024Updated 2 years ago
- Parallel Prefix Sum (Scan) with CUDA☆29Jun 22, 2024Updated last year
- c++中文资源集合链接, C++ 资源大全中文版,标准库、Web应用框架、人工智能、数据库、图片处理、机器学习、日志、代码分析等。由「开源前哨」和「CPP开发者」微信公号团队维护更新。 https://github.com/jobbole/awesome-cpp-cn☆20Mar 17, 2023Updated 3 years ago
- This is the implementation repository of our ICSE'22 paper: Muffin: Testing Deep Learning Libraries via Neural Architecture Fuzzing.☆33Jun 17, 2022Updated 3 years ago
- ☆13May 14, 2024Updated 2 years ago
- Implementation of our paper: Komaritzan and Botsch, Fast Projective Skinning, ACM MIG 2019.☆58Jan 27, 2024Updated 2 years ago
- A Benchmark Suite for Heterogeneous System Computation☆56Feb 20, 2025Updated last year
- A large-scale training and benchmarking framework for rPPG.☆10Nov 26, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- PhysWorld: From Real Videos to World Models of Deformable Objects via Physics-Aware Demonstration Synthesis☆37Oct 27, 2025Updated 6 months ago
- This repository contains the source codes for the paper: "SPACE: A Simulator for Physical Interactions and Causal Learning in 3D Environm…☆16Oct 11, 2021Updated 4 years ago
- Accelerating Multitask Training Trough Adaptive Transition [Efficient ML Model]☆12May 23, 2025Updated last year
- A C++ port of karpathy/micrograd, a tiny scalar-valued autograd engine and a neural net library☆13Nov 24, 2023Updated 2 years ago
- ☆21May 13, 2022Updated 4 years ago
- ☆11Jun 13, 2022Updated 3 years ago
- Multi-GPU Framework for Voxel Grid Computations☆67Mar 26, 2026Updated 2 months ago
- Code for 'Real-time Large-scale Deformation of Gaussian Splatting'☆102Sep 6, 2024Updated last year
- Implementation of analytic collision penalty eigensystems (with Matlab)☆19Oct 23, 2025Updated 7 months ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Unstructured computations on emerging architectures.☆15Jun 1, 2022Updated 3 years ago
- taichi hackathon repo.☆18Dec 15, 2022Updated 3 years ago
- Multi-agent reinforcement learning for adaptive mesh refinement☆14Aug 15, 2023Updated 2 years ago
- A Winograd Minimal Filter Implementation in CUDA☆29Aug 25, 2021Updated 4 years ago
- some physics implemented on Taichi-AOT & Unity☆17Dec 4, 2022Updated 3 years ago
- Modified g2o with GPU support for general matrix calculations.☆15Jun 28, 2025Updated 10 months ago
- ☆15Aug 29, 2021Updated 4 years ago