sifakis / CS639S23_DemosLinks
Software artifacts and Demos for CS639 (Spring 2023) "Parallel and Throughput-Optimized Programming"
☆18Updated 2 years ago
Alternatives and similar repositories for CS639S23_Demos
Users that are interested in CS639S23_Demos are comparing it to the libraries listed below
Sorting:
- General Matrix Multiplication using NVIDIA Tensor Cores☆27Updated 11 months ago
- Code and data for paper "(How) do Language Models Track State?"☆21Updated 9 months ago
- all the materials for cs140e winter 2026☆27Updated this week
- A set of hands-on tutorials for CUDA programming☆245Updated last year
- Introduction to CUDA programming and debugging☆17Updated 3 years ago
- High-Performance SGEMM on CUDA devices☆115Updated 11 months ago
- CUDA implementation of parallel Depth First Search (DFS) algorithm and it's comparison with a serial C++ DFS implementation.☆29Updated 7 years ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆31Updated 5 months ago
- Parallel framework for training and fine-tuning deep neural networks☆69Updated 2 months ago
- ☆79Updated last month
- ☆148Updated 7 months ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆95Updated 2 years ago
- Kinetics: Rethinking Test-Time Scaling Laws☆85Updated 6 months ago
- Implementation-focused introduction to Lie groups for roboticists☆25Updated 7 years ago
- An implementation of parallel exclusive scan in CUDA☆65Updated 7 years ago
- CUDA Guide☆75Updated 2 years ago
- Udacity CS344 Introduction to Parallell Programming (https://classroom.udacity.com/courses/cs344), with assignments/materials updated to …☆46Updated 4 years ago
- A* implementation for NVIDIA GPU☆72Updated 5 years ago
- Custom PTX Instruction Benchmark☆137Updated 10 months ago
- Sparsity support for PyTorch☆38Updated 9 months ago
- A curated list for Efficient Large Language Models☆11Updated last year
- A novell, highly-optimized CUDA implementation of k-means algorithm.☆38Updated 3 years ago
- The evaluation framework for training-free sparse attention in LLMs☆108Updated 2 months ago
- Learn OpenMP examples step by step☆101Updated 11 months ago
- Complete software package for the Iris Lunar Rover (CMU).☆16Updated last year
- Attention in SRAM on Tenstorrent Grayskull☆40Updated last year
- Loop Nest - Linear algebra compiler and code generator.☆21Updated 3 years ago
- Personal solutions to the Triton Puzzles☆20Updated last year
- Learning about CUDA by writing PTX code.☆151Updated last year
- Direct solver for sparse SPD matrices for nonlinear optimization. Implements supernodal Cholesky decomposition algorithm, and supports GP…☆97Updated 3 months ago