Will write CUDA for 100 days
☆39May 25, 2025Updated 10 months ago
Alternatives and similar repositories for 100-days-of-cuda
Users that are interested in 100-days-of-cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆30Mar 26, 2026Updated 2 weeks ago
- 个人学习编译原理、理解创造一个编译器主体流程的小项目☆10Oct 7, 2020Updated 5 years ago
- Compile TensorFlow to C++ library for CMake project☆15Oct 30, 2017Updated 8 years ago
- A c++ client library for redis cluster.☆14Mar 9, 2016Updated 10 years ago
- An expression parser supporting multiple types☆21Sep 25, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆45May 4, 2025Updated 11 months ago
- Go和大语言模型编程☆44Mar 5, 2025Updated last year
- Explore training for quantized models☆26Jul 12, 2025Updated 8 months ago
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 2 months ago
- A bunch of kernels that might make stuff slower 😉☆87Updated this week
- GEMM☆10Aug 26, 2023Updated 2 years ago
- Compile & run a single CUDA file on the cloud GPUs☆14Sep 8, 2024Updated last year
- ☆11Sep 21, 2022Updated 3 years ago
- GPU Kernels☆223Apr 27, 2025Updated 11 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A multi-thread implementation of node2vec random walk.☆27Jan 23, 2021Updated 5 years ago
- 汇编语言学习的例子☆10Aug 5, 2021Updated 4 years ago
- 开课吧全栈学习笔记☆11Apr 19, 2022Updated 3 years ago
- GEMV implementation with CUTLASS☆19Aug 21, 2025Updated 7 months ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- ☆14Nov 3, 2025Updated 5 months ago
- 《汇编语言一发入魂》配套代码☆15May 30, 2020Updated 5 years ago
- Code of the paper "SPINE: Structural Identity Preserved Inductive Network Embedding"☆12Jul 29, 2019Updated 6 years ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 8 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- 。☆13Jan 15, 2022Updated 4 years ago
- DoubleAI’s hyperoptimised version of cuGraph☆50Mar 3, 2026Updated last month
- ☆18Nov 22, 2025Updated 4 months ago
- 中国大学MOOC-浙江大学-翁恺老师网课-C语言程序设计,我从零开始自学编程的记录。☆16May 18, 2020Updated 5 years ago
- ☆32Jul 2, 2025Updated 9 months ago
- VHDL数字电路设计☆13Aug 30, 2017Updated 8 years ago
- 《PostgreSQL内部机制剖析(译)》适用于数据库管理员和系统开发人员☆18Jan 20, 2020Updated 6 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- portFFT is a library implementing Fast Fourier Transforms using SYCL☆19Mar 1, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- All Resources from Stanford CS106B 2021☆24Jul 11, 2025Updated 9 months ago
- Toy vector database written in c99.☆25Sep 5, 2024Updated last year
- a reactor network library☆16Aug 21, 2025Updated 7 months ago
- ☆67May 23, 2025Updated 10 months ago
- My submission for the GPUMODE/AMD fp8 mm challenge☆29Jun 4, 2025Updated 10 months ago
- ☆12Sep 2, 2025Updated 7 months ago
- Welcome to the GPU-FFT-Optimization repository! We present cutting-edge algorithms and implementations for optimizing the Fast Fourier Tr…☆21Dec 19, 2025Updated 3 months ago