csce585-mlsystems / project-athena
This is the course project for CSCE585: ML Systems. Students will build their machine learning systems based on the provided infrastructure --- Athena.
☆13Updated 4 years ago
Alternatives and similar repositories for project-athena:
Users that are interested in project-athena are comparing it to the libraries listed below
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 2 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆19Updated 10 months ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆19Updated last year
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- SyReNN: Symbolic Representations for Neural Networks☆40Updated last year
- An Attention Superoptimizer☆20Updated 8 months ago
- Depict GPU memory footprint during DNN training of PyTorch☆11Updated 2 years ago
- Slides from 2021-12-15 talk, "TVM Developer Bootcamp – Writing Hardware Backends"☆10Updated 2 years ago
- ☆11Updated 3 years ago
- You Only Search Once: On Lightweight Differentiable Architecture Search for Resource-Constrained Embedded Platforms☆10Updated last year
- A 8-/16-/32-/64-bit floating point number family☆16Updated 2 years ago
- AutoCAT: Reinforcement Learning for Automated Exploration of Cache-Timing Attacks☆44Updated last year
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Updated 5 years ago
- ☆12Updated 2 years ago
- A source-to-source compiler for optimizing CUDA dynamic parallelism by aggregating launches☆14Updated 5 years ago
- LLVM-Canon aims to transform LLVM modules into a canonical form by reordering and renaming instructions while preserving the same semanti…☆14Updated 8 months ago
- "Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices", official implementation☆28Updated last year
- ☆18Updated 5 years ago
- A "gym" style toolkit for building lightweight NAS systems.☆13Updated 2 years ago
- A Sparse-tensor Communication Framework for Distributed Deep Learning☆13Updated 3 years ago
- Collection of Papers and Trials on Deep Learning to aid EE design☆38Updated 4 years ago
- ☆22Updated 4 years ago
- Some microbenchmarks and design docs before commencement☆12Updated 3 years ago
- [CF ’20] Verified Instruction-Level Energy Consumption Measurement for NVIDIA GPUs☆15Updated 4 years ago
- ☆20Updated last year
- Honest-but-Curious Nets: Sensitive Attributes of Private Inputs Can Be Secretly Coded into the Classifiers' Outputs (ACM CCS'21)☆18Updated 2 years ago
- This is a repo which contains some details about how to use OpenCL backend (Xilinx/Intel).☆24Updated 5 years ago
- MLSys 2021 paper: MicroRec: efficient recommendation inference by hardware and data structure solutions☆16Updated 3 years ago
- An implementation of a BinaryConnect network for cifar10☆11Updated 5 years ago
- Tacker: Tensor-CUDA Core Kernel Fusion for Improving the GPU Utilization while Ensuring QoS☆18Updated 3 years ago