sifakis / CS639S23_DemosLinks
Software artifacts and Demos for CS639 (Spring 2023) "Parallel and Throughput-Optimized Programming"
☆18Updated 2 years ago
Alternatives and similar repositories for CS639S23_Demos
Users that are interested in CS639S23_Demos are comparing it to the libraries listed below
Sorting:
- General Matrix Multiplication using NVIDIA Tensor Cores☆25Updated 10 months ago
- A set of hands-on tutorials for CUDA programming☆242Updated last year
- Code and data for paper "(How) do Language Models Track State?"☆20Updated 8 months ago
- ☆64Updated last week
- ☆28Updated 2 months ago
- 11-785 Introduction to Deep Learning (IDeeL) website with logistics and select course materials☆70Updated this week
- A lightweight, user-friendly data-plane for LLM training.☆37Updated 2 months ago
- Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"☆94Updated 2 years ago
- CUDA implementation of parallel Depth First Search (DFS) algorithm and it's comparison with a serial C++ DFS implementation.☆29Updated 7 years ago
- Introduction to CUDA programming and debugging☆16Updated 3 years ago
- Kinetics: Rethinking Test-Time Scaling Laws☆82Updated 4 months ago
- Neural Optimal Transport with Lagrangian Costs☆60Updated 6 months ago
- CUDA Guide☆74Updated last year
- NVIDIA tools guide☆149Updated 10 months ago
- Experimental paper writing linter.☆35Updated last year
- PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]☆44Updated last month
- ☆50Updated this week
- A curated list of awesome GPGPU (CUDA/OpenCL/Vulkan) resources☆102Updated 2 years ago
- Introduction to CUDA programming☆129Updated 8 years ago
- Parallel framework for training and fine-tuning deep neural networks☆70Updated 2 weeks ago
- ☆147Updated 5 months ago
- Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models (ICLR 2024)☆14Updated 6 months ago
- Memory Optimizations for Deep Learning (ICML 2023)☆111Updated last year
- ☆32Updated last year
- Autocomp: AI Code Optimizer for Tensor Accelerators☆50Updated this week
- The evaluation framework for training-free sparse attention in LLMs☆104Updated last month
- ☆66Updated 5 months ago
- Parallelizing non-linear sequential models over the sequence length☆55Updated 5 months ago
- Abstractions of memory, allocator, vector, tuple, shared_ptr, unique_ptr, bitset, variant and string working on both CPU and GPU☆31Updated 3 months ago
- Custom PTX Instruction Benchmark☆134Updated 9 months ago