☆220Aug 2, 2024Updated last year
Alternatives and similar repositories for Programming-Massively-Parallel-Processors
Users that are interested in Programming-Massively-Parallel-Processors are comparing it to the libraries listed below
Sorting:
- ☆49Apr 15, 2024Updated last year
- CUDA 6大并行计算模式 代码与笔记☆61Jul 30, 2020Updated 5 years ago
- Create cohorts from databases utilizing the OMOP CDM☆11May 19, 2025Updated 9 months ago
- Material for gpu-mode lectures☆5,800Feb 1, 2026Updated last month
- A collection of Topology Methods in Deep Learning☆17Jun 19, 2020Updated 5 years ago
- ☆18Jan 4, 2024Updated 2 years ago
- Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memo…☆16Sep 24, 2017Updated 8 years ago
- General Matrix Multiplication using NVIDIA Tensor Cores☆28Jan 25, 2025Updated last year
- Awesome code, projects, books, etc. related to CUDA☆31Feb 3, 2026Updated last month
- ☆16Dec 3, 2024Updated last year
- A library to handle measurement uncertainties & error propagation☆24Oct 15, 2024Updated last year
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆144Jul 2, 2021Updated 4 years ago
- Repository of AI resources from workshops hosted by ACM AI at UCSD 🧠☆20Feb 3, 2026Updated last month
- Nim port of a simple 2D physics engine☆21Jan 1, 2022Updated 4 years ago
- Step-by-step optimization of CUDA SGEMM☆433Mar 30, 2022Updated 3 years ago
- Optimize softmax in triton in many cases☆23Sep 6, 2024Updated last year
- Dissecting NVIDIA GPU Architecture☆116Jul 11, 2022Updated 3 years ago
- Microprocessor 2 Lab Template☆11Apr 29, 2024Updated last year
- A Minimalistic Auto-Diff Optimization Framework for Teaching and Understanding Pytorch☆26Updated this week
- ⛅ Run OpenVSCode Server in Google Cloud Shell☆11Dec 22, 2023Updated 2 years ago
- Triton Compiler related materials.☆42Jan 4, 2025Updated last year
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆461Mar 10, 2025Updated 11 months ago
- Some CUDA projects and utility☆27Nov 7, 2019Updated 6 years ago
- A repository to host examples for Prologue framework written in Nim language.☆36Mar 19, 2023Updated 2 years ago
- TOON as DSPy adapter☆25Feb 1, 2026Updated last month
- ABSTRACT: In this paper, a two-stage grid connected photovoltaic system present which consists of inverter and dc-dc converter (Boost con…☆11Sep 15, 2021Updated 4 years ago
- how to optimize some algorithm in cuda.☆2,825Feb 15, 2026Updated 2 weeks ago
- Linux on 400MHz ARM926 AT91SAM9N12☆34Aug 5, 2022Updated 3 years ago
- 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉☆5,022Feb 25, 2026Updated last week
- ☆97Mar 26, 2025Updated 11 months ago
- ☆13Feb 24, 2026Updated last week
- ☆13Updated this week
- A parser for PTX 6.5☆13Jun 19, 2023Updated 2 years ago
- A simple LED sequencer based Graduation Cap☆13Jul 16, 2022Updated 3 years ago
- Open-source Human Feedback Library☆11Oct 25, 2023Updated 2 years ago
- Tool to generate Android build system files (Android.mk, Android.bp) from APK automatically.☆10Nov 1, 2021Updated 4 years ago
- Clock Domain Crossing Design(use MCP formulation without feedback)基于MCP不带反馈的跨时钟域设计☆12Jan 3, 2020Updated 6 years ago
- Hardware implementation of a Fixed Point Recursive Forward and Inverse FFT algorithm☆16Mar 3, 2018Updated 8 years ago
- Complete software package for the Iris Lunar Rover (CMU).☆16Feb 23, 2026Updated last week