Will write CUDA for 100 days
☆38May 25, 2025Updated 9 months ago
Alternatives and similar repositories for 100-days-of-cuda
Users that are interested in 100-days-of-cuda are comparing it to the libraries listed below
Sorting:
- A std::execution style runtime context and High Performance RPC Transport for using OpenUCX. Including CUDA/ROCM/... devices with RDMA.☆29Feb 22, 2026Updated 3 weeks ago
- Compile TensorFlow to C++ library for CMake project☆15Oct 30, 2017Updated 8 years ago
- A c++ client library for redis cluster.☆14Mar 9, 2016Updated 10 years ago
- Linux from beginner to master☆32Dec 4, 2025Updated 3 months ago
- A bunch of kernels that might make stuff slower 😉☆79Updated this week
- ☆45May 4, 2025Updated 10 months ago
- Go和大语言模型编程☆44Mar 5, 2025Updated last year
- Explore training for quantized models☆26Jul 12, 2025Updated 8 months ago
- GEMM☆10Aug 26, 2023Updated 2 years ago
- Compile & run a single CUDA file on the cloud GPUs☆14Sep 8, 2024Updated last year
- ☆11Sep 21, 2022Updated 3 years ago
- GPU Kernels☆222Apr 27, 2025Updated 10 months ago
- A multi-thread implementation of node2vec random walk.☆27Jan 23, 2021Updated 5 years ago
- 开课吧全栈学习笔记☆11Apr 19, 2022Updated 3 years ago
- ☆47Mar 27, 2023Updated 2 years ago
- GEMV implementation with CUTLASS☆19Aug 21, 2025Updated 6 months ago
- Multi-heap-sort for many small arrays, quicksort with 3 pivots for one big array, CUDA acceleration, CUDA memory compression.☆13Sep 29, 2024Updated last year
- 《汇编语言一发入魂》配套代码☆15May 30, 2020Updated 5 years ago
- Export yolov5 model to run on cpu using tflite☆14Aug 12, 2021Updated 4 years ago
- All Resources from Stanford CS106B 2021☆24Jul 11, 2025Updated 8 months ago
- Code of the paper "SPINE: Structural Identity Preserved Inductive Network Embedding"☆12Jul 29, 2019Updated 6 years ago
- Official implementation for DenseMixer: Improving MoE Post-Training with Precise Router Gradient☆66Aug 3, 2025Updated 7 months ago
- 。☆13Jan 15, 2022Updated 4 years ago
- ☆18Nov 22, 2025Updated 3 months ago
- DoubleAI’s hyperoptimised version of cuGraph☆51Mar 3, 2026Updated 2 weeks ago
- ☆32Jul 2, 2025Updated 8 months ago
- 《PostgreSQL内部机制剖析(译)》适用于数据库管理员和系统开发人员☆18Jan 20, 2020Updated 6 years ago
- Fast GPU based tensor core reductions☆13Jan 13, 2023Updated 3 years ago
- Experimental GPU language with meta-programming☆27Sep 6, 2024Updated last year
- portFFT is a library implementing Fast Fourier Transforms using SYCL☆19Mar 1, 2025Updated last year
- [ICML 2025] SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity☆71Mar 10, 2026Updated last week
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Mar 12, 2026Updated last week
- ☆12Aug 31, 2023Updated 2 years ago
- a reactor network library☆16Aug 21, 2025Updated 6 months ago
- ☆67May 23, 2025Updated 9 months ago
- ☆15Mar 23, 2022Updated 3 years ago
- ☆13Sep 2, 2025Updated 6 months ago
- Welcome to the GPU-FFT-Optimization repository! We present cutting-edge algorithms and implementations for optimizing the Fast Fourier Tr…☆21Dec 19, 2025Updated 3 months ago
- Official repository of Quickscorer: a fast algorithm to rank documents with additive ensembles of regression trees.☆18Aug 11, 2016Updated 9 years ago