☆96Nov 11, 2025Updated 6 months ago
Alternatives and similar repositories for GPU_Programming
Users that are interested in GPU_Programming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Step by step implementation of a fast softmax kernel in CUDA☆67Jan 6, 2025Updated last year
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆19Dec 22, 2023Updated 2 years ago
- BFloat16 Fused Adam Operator for PyTorch☆19Nov 16, 2024Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- ☆13Dec 22, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated 3 weeks ago
- RAPIDS Deployment Documentation☆15May 13, 2026Updated last week
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆77Feb 18, 2026Updated 3 months ago
- ☆14Feb 23, 2025Updated last year
- NVIDIA tools guide☆165Jan 7, 2025Updated last year
- Read custom dataset☆12Mar 31, 2023Updated 3 years ago
- Repository to host ROCm Developer Hub Notebook Tutorials☆78May 1, 2026Updated 2 weeks ago
- ☆14Apr 10, 2023Updated 3 years ago
- Decompose source code into templates and fragments for any language.☆23Aug 29, 2022Updated 3 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Flash Attention in raw Cuda C beating PyTorch☆38May 14, 2024Updated 2 years ago
- Reinforcement Learning example in Nim, playing tic tac toe. Based off original C version from the great Antirez☆15Apr 2, 2025Updated last year
- [UNMAINTAINED] md6 FTW☆10Mar 17, 2016Updated 10 years ago
- Variational Autoencoder with non-euclidean (hyperbolic) latent space☆12Nov 25, 2022Updated 3 years ago
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆672Apr 27, 2026Updated 3 weeks ago
- torchcomms: a modern PyTorch communications API☆362Updated this week
- A blog for LLVM(v11.0.0) beginner, step by step, with detailed documents and comments. Record the way I learn LLVM.☆14Jun 17, 2022Updated 3 years ago
- ☆14Mar 29, 2026Updated last month
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Oct 7, 2024Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Apply GPU in ML and DL☆69Mar 23, 2026Updated last month
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆482Mar 10, 2025Updated last year
- Implementation from scratch in CUDA C++ of image processing algorithms.☆22Oct 26, 2020Updated 5 years ago
- A collection of algorithms to reverse using partial information various hashes used by Minecraft to seed its ChunkRandom PRNG.☆10Mar 7, 2024Updated 2 years ago
- ☆24Apr 7, 2026Updated last month
- ☆40Feb 14, 2026Updated 3 months ago
- formation Deep Learning Optimisé pour Jean Zay☆19Oct 20, 2025Updated 7 months ago
- Real-time image and video foveation transform using PyCUDA☆11Jan 6, 2021Updated 5 years ago
- ring-attention experiments☆167Oct 17, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ☆18Oct 12, 2022Updated 3 years ago
- Source code to accompany research paper on training multi token prediction language models using self-distillation.☆37Feb 21, 2026Updated 3 months ago
- ☆1,505Mar 31, 2026Updated last month
- ☆16Dec 30, 2024Updated last year
- Grasp Generation models on OakInk-Shape dataset☆17Apr 4, 2024Updated 2 years ago
- Volumetric MRI visualization and analysis tool for BraTS datasets. Converts NIfTI slices to 3D meshes using Marching Cubes algorithm with…☆26Nov 30, 2025Updated 5 months ago
- Research on DeepSeek Sparse Attention☆40Oct 8, 2025Updated 7 months ago