☆95Nov 11, 2025Updated 5 months ago
Alternatives and similar repositories for GPU_Programming
Users that are interested in GPU_Programming are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Step by step implementation of a fast softmax kernel in CUDA☆68Jan 6, 2025Updated last year
- A proof-of-concept implementation of Titans: models mixing long-term, short-term and persistent memories☆24Apr 9, 2025Updated last year
- BFloat16 Fused Adam Operator for PyTorch☆19Nov 16, 2024Updated last year
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- OpenShell is the safe, private runtime for autonomous AI agents.☆119Apr 11, 2026Updated 2 weeks ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Official repository Flash Local Linear Attention☆23Apr 23, 2026Updated last week
- RAPIDS Deployment Documentation☆15Apr 17, 2026Updated 2 weeks ago
- ☆20Apr 24, 2026Updated last week
- A series of high-performance GEMM (General Matrix Multiply) implementations Iteratively optimised for H100 GPUs in Pure CUDA.☆77Feb 18, 2026Updated 2 months ago
- My study notes and hands-on projects for CUDA-based GPU programming☆11Dec 11, 2025Updated 4 months ago
- ☆15Feb 13, 2018Updated 8 years ago
- Comparing Deep Learning Inference of Pytorch models running on CPU, CUDA and TensorRT☆16Feb 20, 2022Updated 4 years ago
- Hugging Face Download (Cache) Manager☆22Aug 7, 2022Updated 3 years ago
- NVIDIA tools guide☆165Jan 7, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆27May 18, 2022Updated 3 years ago
- Read custom dataset☆12Mar 31, 2023Updated 3 years ago
- Repository to host ROCm Developer Hub Notebook Tutorials☆70Apr 23, 2026Updated last week
- Reinforcement Learning example in Nim, playing tic tac toe. Based off original C version from the great Antirez☆15Apr 2, 2025Updated last year
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆165Oct 19, 2023Updated 2 years ago
- ☆12Apr 26, 2024Updated 2 years ago
- A curriculum for learning about gpu performance engineering, from scratch to what the frontier AI labs do☆598Mar 2, 2026Updated last month
- Variational Autoencoder with non-euclidean (hyperbolic) latent space☆12Nov 25, 2022Updated 3 years ago
- 这是我在阅读《x86汇编语言 从实模式到保护模式》对每一章节代码的理解,并注释了部分代码☆10Apr 12, 2026Updated 2 weeks ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 一个谷歌高清图片爬虫☆13Jan 7, 2020Updated 6 years ago
- ☆14Mar 29, 2026Updated last month
- Apply GPU in ML and DL☆68Mar 23, 2026Updated last month
- ☆23Feb 16, 2022Updated 4 years ago
- A repo based on XiLin Li's PSGD repo that extends some of the experiments.☆14Oct 7, 2024Updated last year
- Fast parallel RNN-Transducer.☆10Nov 1, 2019Updated 6 years ago
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆475Mar 10, 2025Updated last year
- a student trainning project for HLS and transformer☆11Oct 19, 2022Updated 3 years ago
- c++ implementation of a simple-virtual-machine☆14Sep 19, 2014Updated 11 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Backtracking regular expression engine written in Python☆13Nov 4, 2022Updated 3 years ago
- ☆23Apr 7, 2026Updated 3 weeks ago
- ☆1,350Mar 31, 2026Updated last month
- ring-attention experiments☆166Oct 17, 2024Updated last year
- Implement Neural Networks in Cuda from Scratch☆24May 17, 2024Updated last year
- ☆18Oct 12, 2022Updated 3 years ago
- Source code to accompany research paper on training multi token prediction language models using self-distillation.☆35Feb 21, 2026Updated 2 months ago