sifakis / CS639S23_DemosLinks
Software artifacts and Demos for CS639 (Spring 2023) "Parallel and Throughput-Optimized Programming"
☆18Updated 2 years ago
Alternatives and similar repositories for CS639S23_Demos
Users that are interested in CS639S23_Demos are comparing it to the libraries listed below
Sorting:
- General Matrix Multiplication using NVIDIA Tensor Cores☆24Updated 9 months ago
- Code and data for paper "(How) do Language Models Track State?"☆20Updated 7 months ago
- PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]☆42Updated 3 weeks ago
- Competitive GPU kernel optimization platform.☆113Updated last week
- Sparsity support for PyTorch☆37Updated 7 months ago
- ☆147Updated 5 months ago
- Parallel framework for training and fine-tuning deep neural networks☆65Updated 2 weeks ago
- ☆36Updated last month
- Implementation of a methodology that allows all sorts of user defined GPU kernel fusion, for non CUDA programmers.☆26Updated last week
- 6.790 | Machine Learning | Draft Site/Notes☆13Updated this week
- ☆50Updated last month
- A lightweight, user-friendly data-plane for LLM training.☆36Updated last month
- Flash Attention in 300-500 lines of CUDA/C++☆27Updated 2 months ago
- ☆28Updated last month
- Preprint | Previously at GenBio ICML 2025☆18Updated 2 months ago
- Experimental paper writing linter.☆35Updated last year
- Personal solutions to the Triton Puzzles☆20Updated last year
- The evaluation framework for training-free sparse attention in LLMs☆102Updated 3 weeks ago
- Memory Optimizations for Deep Learning (ICML 2023)☆110Updated last year
- We are applying the notion of the spectral radius to NLP and data represented as graphs.☆10Updated 5 months ago
- CUDA Guide☆74Updated last year
- 삼각형의 실전! Triton☆16Updated last year
- OpenAI 2025 ICPC Submissions☆55Updated last month
- First Latency-Aware Competitive LLM Agent Benchmark☆23Updated 5 months ago
- LLM checkpointing for DeepSpeed/Megatron☆21Updated 3 weeks ago
- Algorithms for approximate attention in LLMs☆19Updated 6 months ago
- ☆95Updated 8 months ago
- Learning about CUDA by writing PTX code.☆146Updated last year
- High-Performance SGEMM on CUDA devices☆109Updated 9 months ago
- A set of hands-on tutorials for CUDA programming☆240Updated last year