ucbrise / cs294-ai-sys-sp22Links
CS294 AI Systems Class Website
☆16Updated 3 years ago
Alternatives and similar repositories for cs294-ai-sys-sp22
Users that are interested in cs294-ai-sys-sp22 are comparing it to the libraries listed below
Sorting:
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆100Updated 4 months ago
- Tutorials for NVIDIA CUPTI samples☆40Updated 3 weeks ago
- FractalTensor is a programming framework that introduces a novel approach to organizing data in deep neural networks (DNNs) as a list of …☆29Updated 11 months ago
- ☆50Updated 6 months ago
- Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning☆23Updated 6 months ago
- This repository contains companion software for the Colfax Research paper "Categorical Foundations for CuTe Layouts".☆77Updated last month
- Open ABI and FFI for Machine Learning Systems☆174Updated last week
- Distributed SDDMM Kernel☆11Updated 3 years ago
- A language and compiler for irregular tensor programs.☆151Updated 11 months ago
- ☆93Updated last year
- ☆24Updated last year
- ☆71Updated 10 months ago
- GPU Performance Advisor☆65Updated 3 years ago
- A memory profiler for NVIDIA GPUs to explore memory inefficiencies in GPU-accelerated applications.☆26Updated last year
- ☆37Updated 2 weeks ago
- Github mirror of trition-lang/triton repo.☆98Updated last week
- ☆26Updated 9 months ago
- Framework to reduce autotune overhead to zero for well known deployments.☆85Updated 2 months ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Updated last year
- How to ensure correctness and ship LLM generated kernels in PyTorch☆121Updated last week
- TiledLower is a Dataflow Analysis and Codegen Framework written in Rust.☆14Updated last year
- ☆64Updated 5 months ago
- ☆31Updated 3 years ago
- An IR for efficiently simulating distributed ML computation.☆30Updated last year
- DeeperGEMM: crazy optimized version☆73Updated 6 months ago
- MLIR-based partitioning system☆148Updated this week
- An Attention Superoptimizer☆22Updated 10 months ago
- A Top-Down Profiler for GPU Applications☆22Updated last year
- a size profiler for cuda binary☆52Updated last month
- Multi-Level Triton Runner supporting Python, IR, PTX, and cubin.☆76Updated last week