☆226Aug 2, 2024Updated last year
Alternatives and similar repositories for Programming-Massively-Parallel-Processors
Users that are interested in Programming-Massively-Parallel-Processors are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆78Jan 21, 2021Updated 5 years ago
- Solution of Programming Massively Parallel Processors☆50Jan 15, 2024Updated 2 years ago
- CUDA 6大并行计算模式 代码与笔记☆62Jul 30, 2020Updated 5 years ago
- ☆49Apr 15, 2024Updated 2 years ago
- Material for gpu-mode lectures☆5,945Feb 1, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Optimized Parallel Tiled Approach to perform Matrix Multiplication by taking advantage of the lower latency, higher bandwidth shared memo…☆16Sep 24, 2017Updated 8 years ago
- ezDPS: An Efficient and Zero-Knowledge Machine Learning Inference Pipeline☆21Jul 14, 2023Updated 2 years ago
- Dissecting NVIDIA GPU Architecture☆119Jul 11, 2022Updated 3 years ago
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆143Jul 2, 2021Updated 4 years ago
- Embedded graphics library to create beautiful UIs for any MCU, MPU and display type.☆11Apr 29, 2024Updated last year
- GameBoy emulator written from scratch☆17Apr 5, 2015Updated 11 years ago
- CUDA GPU Benchmark☆38Jan 31, 2025Updated last year
- Awesome code, projects, books, etc. related to CUDA☆32Mar 30, 2026Updated 2 weeks ago
- ☆18Jan 4, 2024Updated 2 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- For popular software systems, the number of daily submitted bug reports is high. Triaging these incoming bugs is a time consuming task. M…☆11Jan 8, 2016Updated 10 years ago
- ☆26Oct 11, 2022Updated 3 years ago
- Verilog implementation of an SPI slave interface. Intially targetted for Atlys devkit (Xilinx Spartan-6) controlled by TotalPhase Cheetah…☆41Jan 8, 2025Updated last year
- Triton Compiler related materials.☆43Mar 16, 2026Updated last month
- AHCI BIOS Security Extension☆15May 16, 2021Updated 4 years ago
- Dronet, adapted for Pytorch.☆11Oct 21, 2025Updated 5 months ago
- Acceleration codes for the Ozaki-scheme on integer matrix multiplication units.☆25Dec 10, 2025Updated 4 months ago
- Vector search using only Parquet and DataFusion☆55Feb 11, 2026Updated 2 months ago
- Step-by-step optimization of CUDA SGEMM☆455Mar 30, 2022Updated 4 years ago
- Deploy open-source AI quickly and easily - Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- An reimplement of liif(Learning Continuous Image Representation with Local Implicit Image Function) using lightning+hydra☆11Mar 26, 2021Updated 5 years ago
- Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.☆10Aug 19, 2023Updated 2 years ago
- Cleanlab Vizzy: illustrating the core ideas behind the Cleanlab algorithm☆16Apr 19, 2023Updated 2 years ago
- A collection of Topology Methods in Deep Learning☆18Jun 19, 2020Updated 5 years ago
- Code for "Adaptive Self-improvement LLM Agentic System for ML Library Development" (ICML 2025)☆15Jan 6, 2026Updated 3 months ago
- Design and UVM Verification of an ALU☆11Jun 14, 2024Updated last year
- moved to https://github.com/Zhaoyilunnn/qdao☆11Aug 30, 2023Updated 2 years ago
- ☆12Mar 28, 2024Updated 2 years ago
- Tapeouts done using OpenFASOC☆18Nov 3, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Load and run Llama from safetensors files in C☆15Oct 24, 2024Updated last year
- [VLDB'24] Blitzcrank is to compress in-memory, OLTP databases. It introduces a new entropy coding algorithm named Delayed Coding.☆39Sep 20, 2024Updated last year
- Bust Calculator is a desktop application that calculates bust size based on image measurements.☆15Feb 28, 2025Updated last year
- A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.☆470Mar 10, 2025Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆474May 30, 2025Updated 10 months ago
- CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.☆19Apr 18, 2023Updated 2 years ago
- Fast CUDA matrix multiplication from scratch☆1,127Sep 2, 2025Updated 7 months ago