coding CUDA everyday!
β74Feb 5, 2026Updated 3 months ago
Alternatives and similar repositories for cuda
Users that are interested in cuda are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- β17Mar 8, 2025Updated last year
- [ACL 2026 π₯] CASS: Nvidia to AMD Transpilation with Data, Models, and Benchmarkβ34Apr 20, 2026Updated 2 weeks ago
- β16Aug 7, 2024Updated last year
- β17May 15, 2025Updated 11 months ago
- RTL implementation of a ray-tracing GPUβ15Dec 18, 2012Updated 13 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI β’ AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Tiny-Megatron, a minimalistic re-implementation of the Megatron libraryβ25Sep 1, 2025Updated 8 months ago
- Stable Diffusion in TensorRT 8.5+β15Mar 19, 2023Updated 3 years ago
- This repository documents my 100-day journey of learning and writing CUDA kernels.β31Mar 29, 2026Updated last month
- A Machine Learning based tool for identifying P2P (Peer To Peer) Bot-Nets using network traffic analysis, as well as detect the hosts invβ¦β12Jan 4, 2023Updated 3 years ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.β481Mar 10, 2025Updated last year
- The official implementation for the intra-stage fusion technique introduced in https://arxiv.org/abs/2409.13221β31Apr 22, 2025Updated last year
- HLS project modeling various sparse accelerators.β12Jan 11, 2022Updated 4 years ago
- learningggggggg π³β619Apr 2, 2025Updated last year
- IJMLC: Open-TI: Open Traffic Intelligence with Augmented Language Modelβ22Jul 30, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Open source RTL implementation of Tensor Core, Sparse Tensor Core, BitWave and SparSynergy in the article: "SparSynergy: Unlocking Flexibβ¦β24Mar 29, 2025Updated last year
- Pipeline Parallelism Emulation and Visualizationβ81Jan 8, 2026Updated 4 months ago
- β15Feb 23, 2025Updated last year
- β49May 20, 2025Updated 11 months ago
- High performance RMSNorm Implement by using SM Core Storage(Registers and Shared Memory)β30Jan 22, 2026Updated 3 months ago
- A centralized hub for all your decentralized needs.β12Jul 28, 2021Updated 4 years ago
- Expert Specialization MoE Solution based on CUTLASSβ26Apr 14, 2026Updated 3 weeks ago
- β97Mar 21, 2026Updated last month
- Perceptron-based branch predictor written in C++β13Dec 14, 2016Updated 9 years ago
- AI Agents on DigitalOcean Gradient AI Platform β’ AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A Heterogeneous GPU Platform for AI and Neural Graphicsβ50Updated this week
- A intelligent matrix format designer for SpMVβ10Oct 10, 2023Updated 2 years ago
- Cute layout visualizationβ38Jan 18, 2026Updated 3 months ago
- Implemented a LightLGM model capable of predicting the closing price movements for hundreds of NASDAQ listed stocks using data from theβ¦β16Jan 8, 2024Updated 2 years ago
- A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformerβ96Feb 20, 2026Updated 2 months ago
- Letter of Recommendation (LOR) samples and guides found on grad schools' websiteβ13Sep 22, 2022Updated 3 years ago
- Tutorial on how to use Ginkgo AI embedding APIs for scientific problemsβ19Nov 22, 2024Updated last year
- Flash-Muon: An Efficient Implementation of Muon Optimizerβ248Jun 15, 2025Updated 10 months ago
- Deploy ChatGLM on Modelzβ16Mar 20, 2023Updated 3 years ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Pipelined 64-bit RISC-V coreβ16Mar 7, 2024Updated 2 years ago
- A Quirky Assortment of CuTe Kernelsβ955Updated this week
- β11Feb 13, 2025Updated last year
- Cluster-level matrix unit integration into GPUs, implemented in Chipyard SoCβ55Jan 20, 2026Updated 3 months ago
- Faster and efficient mechvibes.com alternative written in Rustβ24Nov 29, 2024Updated last year
- cuJSON: A Highly Parallel JSON Parser for GPUsβ46Dec 12, 2025Updated 4 months ago
- β57Feb 24, 2026Updated 2 months ago