sifakis / CS639S23_Demos

Software artifacts and Demos for CS639 (Spring 2023) "Parallel and Throughput-Optimized Programming"

☆17

Alternatives and similar repositories for CS639S23_Demos:

Users that are interested in CS639S23_Demos are comparing it to the libraries listed below

stanford-cs149 / intro_to_cuda
Introduction to CUDA programming and debugging
☆13Updated 2 years ago
axonn-ai / axonn
A parallel framework for training deep neural networks
☆54Updated last week
hyhieu / easy_pybind
☆32Updated 8 months ago
riverstone496 / awesome-second-order-optimization
☆25Updated last year
YinTat / optimizationbook
☆134Updated this week
habanero-lab / APPy
APPy (Annotated Parallelism for Python) enables users to annotate loops and tensor expressions in Python with compiler directives akin to…
☆23Updated 2 weeks ago
divyanshu-talwar / Parallel-DFS
CUDA implementation of parallel Depth First Search (DFS) algorithm and it's comparison with a serial C++ DFS implementation.
☆29Updated 6 years ago
CUDA-Tutorial / CodeSamples
Code samples for the CUDA tutorial "CUDA and Applications to Task-based Programming"
☆89Updated last year
ThoenigAdrian / NeuralNetworksCudaTutorial
Implement Neural Networks in Cuda from Scratch
☆22Updated 9 months ago
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆26Updated 5 months ago
unixpickle / learn-ptx
Learning about CUDA by writing PTX code.
☆106Updated last year
EricDarve / cme213-spring-2021
CME 213 Spring 2021
☆64Updated 3 years ago
david-m-rosen / Preconditioners
A set of useful algebraic preconditioners for iterative numerical linear-algebraic methods.
☆18Updated 2 years ago
facebookresearch / adaptive_scheduling
Experimental scripts for researching data adaptive learning rate scheduling.
☆23Updated last year
hpcgarage / cuASR
cuASR: CUDA Algebra for Semirings
☆35Updated 2 years ago
moritztng / grayskull-attention
Attention in SRAM on Tenstorrent Grayskull
☆31Updated 7 months ago
spcl / sten
Sparsity support for PyTorch
☆34Updated 3 weeks ago
mosharaf / cse585
Advanced Scalable Systems for X
☆31Updated 3 months ago
puttsk / cuda-tutorial
A set of hands-on tutorials for CUDA programming
☆212Updated 10 months ago
YiteWang / NTK-SAP
[ICLR2023] NTK-SAP: Improving neural network pruning by aligning training dynamics
☆18Updated last year
sandeepkumar-skb / pytorch_custom_op
End to End steps for adding custom ops in PyTorch.
☆20Updated 4 years ago
dhyuan99 / VecKM
Official GitHub repo for VecKM. A very efficient and descriptive local geometry encoder / point tokenizer / patch embedder. ICML2024.
☆28Updated 2 months ago
MDK8888 / vllmini
A minimal implementation of vllm.
☆34Updated 7 months ago
ucbrise / cs294-ai-sys-sp22
CS294 AI Systems Class Website
☆15Updated 2 years ago
alexzhang13 / Triton-Puzzles-Solutions
Personal solutions to the Triton Puzzles
☆18Updated 7 months ago
NonvolatileMemory / flash_tree_attn
☆16Updated 2 months ago