ucbrise / cs294-ai-sys-sp22Links
CS294 AI Systems Class Website
☆16Updated 3 years ago
Alternatives and similar repositories for cs294-ai-sys-sp22
Users that are interested in cs294-ai-sys-sp22 are comparing it to the libraries listed below
Sorting:
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆97Updated 3 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 5 months ago
- An Attention Superoptimizer☆22Updated 8 months ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆29Updated 9 months ago
- Tutorials for NVIDIA CUPTI samples☆34Updated last month
- Distributed SDDMM Kernel☆11Updated 3 years ago
- ☆45Updated 5 months ago
- ☆25Updated last year
- MLIR-based partitioning system☆136Updated this week
- Microsoft Collective Communication Library☆66Updated 10 months ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆57Updated 2 weeks ago
- ☆31Updated 3 years ago
- Debug print operator for cudagraph debugging☆14Updated last year
- Triton-based Symmetric Memory operators and examples☆32Updated this week
- Github mirror of trition-lang/triton repo.☆82Updated last week
- A Data-Centric Compiler for Machine Learning☆84Updated last year
- ☆92Updated 11 months ago
- extensible collectives library in triton☆89Updated 6 months ago
- ☆83Updated 2 years ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆45Updated last month
- (NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.☆41Updated 2 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆25Updated last year
- ☆57Updated 4 months ago
- ☆27Updated 7 months ago
- ☆24Updated last year
- Artifacts of EVT ASPLOS'24☆26Updated last year
- Framework to reduce autotune overhead to zero for well known deployments.☆84Updated 3 weeks ago
- An IR for efficiently simulating distributed ML computation.☆29Updated last year
- NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer☆135Updated 3 weeks ago
- A language and compiler for irregular tensor programs.☆148Updated 10 months ago