☆96May 30, 2026Updated last week
Alternatives and similar repositories for GPU_Programming
Users that are interested in GPU_Programming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Step by step implementation of a fast softmax kernel in CUDA☆68Jan 6, 2025Updated last year
- torch.compile artifacts for common deep learning models, can be used as a learning resource for torch.compile☆19Dec 22, 2023Updated 2 years ago
- BFloat16 Fused Adam Operator for PyTorch☆19Nov 16, 2024Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- ☆92Feb 29, 2024Updated 2 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆13Dec 22, 2024Updated last year
- Official repository Flash Local Linear Attention☆36May 28, 2026Updated last week
- ☆20May 30, 2026Updated last week
- OpenShell is the safe, private runtime for autonomous AI agents.☆153May 29, 2026Updated last week
- [ACL'24 Oral] Analysing The Impact of Sequence Composition on Language Model Pre-Training☆23Aug 18, 2024Updated last year
- Code for the EMNLP24 paper "A simple and effective L2 norm based method for KV Cache compression."☆18Dec 13, 2024Updated last year
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆77Feb 18, 2026Updated 3 months ago
- IBM Spectrum LSF - IBM Cloud☆16Sep 30, 2024Updated last year
- Homepage of Software Engineering for Machine Learning☆17May 25, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- My study notes and hands-on projects for CUDA-based GPU programming☆12Dec 11, 2025Updated 5 months ago
- A Regex engine which is implemented in a traditional way and able to generate graphics of finite automation.☆10May 3, 2018Updated 8 years ago
- ☆15Feb 13, 2018Updated 8 years ago
- Comparing Deep Learning Inference of Pytorch models running on CPU, CUDA and TensorRT☆17Feb 20, 2022Updated 4 years ago
- ☆14Feb 23, 2025Updated last year
- NVIDIA tools guide☆165Jan 7, 2025Updated last year
- Signal processing features based on xtensor☆12Nov 25, 2023Updated 2 years ago
- Using FVM to solve navier stokes equations in the 2D lid driven cavity problem☆12Jan 21, 2018Updated 8 years ago
- An example repo for generating python bindings with cppyy.☆14Nov 19, 2019Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Planetary crustal and mantle properties, and lithospheric displacements, stress and strain, calculations in spherical harmonics.☆15May 22, 2026Updated 2 weeks ago
- Read custom dataset☆12Mar 31, 2023Updated 3 years ago
- Repository to host ROCm Developer Hub Notebook Tutorials☆83Jun 1, 2026Updated last week
- Flash Attention in raw Cuda C beating PyTorch☆39May 14, 2024Updated 2 years ago
- ☆57Updated this week
- Simple command line to get directory information☆13May 27, 2025Updated last year
- Row-wise block scaling for fp8 quantization matrix multiplication. Solution to GPU mode AMD challenge.☆19Feb 9, 2026Updated 4 months ago
- Reinforcement Learning example in Nim, playing tic tac toe. Based off original C version from the great Antirez☆15Apr 2, 2025Updated last year
- torchcomms: a modern PyTorch communications API☆368Updated this week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- 这是我在阅读《x86汇编语言 从实模式到保护模式》对每一章节代码的理解,并注释了部分代码☆10Apr 12, 2026Updated last month
- ☆14Mar 29, 2026Updated 2 months ago
- C++ version of Conway's Game of Life with raylib. This project is accompanied by a video tutorial that explains everything in detail.☆12Mar 14, 2024Updated 2 years ago
- ☆23Feb 16, 2022Updated 4 years ago
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆804Apr 27, 2026Updated last month
- Apply GPU in ML and DL☆68Mar 23, 2026Updated 2 months ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆488Mar 10, 2025Updated last year