☆53Dec 4, 2023Updated 2 years ago
Alternatives and similar repositories for ECE408
Users that are interested in ECE408 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code base and slides for ECE408:Applied Parallel Programming On GPU.☆144Jul 2, 2021Updated 4 years ago
- IMPACT GPU Algorithms Teaching Labs☆59Apr 21, 2023Updated 3 years ago
- Xiao's CUDA Optimization Guide [NO LONGER ADDING NEW CONTENT]☆325Nov 8, 2022Updated 3 years ago
- CUDA solutions for the lab assignments in the UIUC-ECE408 Applied Parallel Programming course.☆19Apr 18, 2023Updated 3 years ago
- ☆23Oct 31, 2023Updated 2 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Lab 5 project of MIT-6.5940, deploying LLaMA2-7B-chat on one's laptop with TinyChatEngine.☆18Dec 1, 2023Updated 2 years ago
- Eliminate compaction jobs in secondary nodes within a group of replicated RocksDB.☆10Jun 5, 2024Updated last year
- Examples and exercises from the book Programming Massively Parallel Processors - A Hands-on Approach. David B. Kirk and Wen-mei W. Hwu (T…☆78Jan 21, 2021Updated 5 years ago
- SystemVerilog implemention of the TAGE branch predictor☆14May 26, 2021Updated 4 years ago
- [HPCA 2026] A GPU-optimized system for efficient long-context LLMs decoding with low-bit KV cache.☆85Dec 18, 2025Updated 4 months ago
- Triton Compiler related materials.☆44Mar 16, 2026Updated last month
- a simple API to use CUPTI☆10Aug 19, 2025Updated 8 months ago
- Trajectory planning for highway situation with classic robotics approach.☆12May 23, 2018Updated 7 years ago
- High performance RDMA-based distributed feature collection component for training GNN model on EXTREMELY large graph☆55Jul 3, 2022Updated 3 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- flash attention tutorial written in python, triton, cuda, cutlass☆506Jan 20, 2026Updated 3 months ago
- 高性能计算相关知识学习笔记,包含学习笔记和相关知识的代码demo,在持续完善中。 如果有帮助的话请Star一下,对作者帮助很大,谢谢!☆474Mar 28, 2023Updated 3 years ago
- Machine Learning Engineer interview preparation. Brushing up Data Structures & Algorithms, System Design and SQL☆25Jun 10, 2021Updated 4 years ago
- ☆174Feb 5, 2026Updated 3 months ago
- Code from the CMU LM inference fall 2025 edition.☆41Dec 7, 2025Updated 4 months ago
- Very simple and stupid TCP/IP stack written in C☆10Mar 25, 2016Updated 10 years ago
- Fireboy & Water Girl in the Forest Temple implemented on an FPGA board for UIUC's ECE385 Digital Systems Laboratory.☆21Mar 30, 2023Updated 3 years ago
- A Light CNN Framework!☆16Apr 8, 2019Updated 7 years ago
- 🎓Automatically Update circult-eda-mlsys-tinyml Papers Daily using Github Actions (Update Every 8th hours)☆10Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- In-depth tutorials and examples on LLM training and inference infrastructure, such as, Pytorch, Fairscale, Nvidia AI Modules (cuDNN, tens…☆22May 19, 2025Updated 11 months ago
- SGLang is a fast serving framework for large language models and vision language models.☆21Updated this week
- Algorithm tutorial for visual SLAM☆13Nov 10, 2019Updated 6 years ago
- Tutorial for assignment of Introduction to Database System☆12Sep 29, 2025Updated 7 months ago
- distributed system resource☆13Jan 14, 2020Updated 6 years ago
- tensorflow fork with Salus integration☆12Jan 7, 2022Updated 4 years ago
- ☆16Mar 26, 2020Updated 6 years ago
- alibabacloud-aiacc-demo☆43May 4, 2023Updated 3 years ago
- Simple Robin Hood hash table implemented using C macros☆15Feb 7, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Adaptive Message Quantization and Parallelization for Distributed Full-graph GNN Training☆24Mar 1, 2024Updated 2 years ago
- 这个项目介绍了简单的CUDA入门,涉及到CUDA执行模型、线程层次、CUDA内存模型、核函数的编写方式以及PyTorch使用CUDA扩展的两种方式。通过该项目可以基本入门基于PyTorch的CUDA扩展的开发方式。☆95Nov 12, 2021Updated 4 years ago
- Fast and easy distributed model training examples.☆12Nov 26, 2024Updated last year
- Simple webapp (Google Appengine) to convert UIUC course listings into importable calendar files☆24Aug 8, 2020Updated 5 years ago
- My solution code to parallel architecture and programming Spring 2016☆12Aug 15, 2016Updated 9 years ago
- A library to abstract between different lossless and lossy compressors☆37Feb 11, 2026Updated 2 months ago
- ☆120Apr 11, 2024Updated 2 years ago