4paradigm / canopy
Canopy is a machine learning learning compiler stack with the capability of adopting high-end FPGAs. As a part of OpenAIOS project, Canopy is an evolved version of Apache TVM. Canopy is able to support a variety of hardware backends such as PCIE-based cloud FPGAs, CPUs and GPUs.
☆12Updated 3 years ago
Alternatives and similar repositories for canopy:
Users that are interested in canopy are comparing it to the libraries listed below
- Slides from 2021-12-15 talk, "TVM Developer Bootcamp – Writing Hardware Backends"☆10Updated 3 years ago
- This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).☆13Updated 3 years ago
- ☆43Updated last year
- Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation☆27Updated 5 years ago
- Repository for SysML19 Artifacts Evaluation☆53Updated 6 years ago
- ParaDnn: A systematic performance analysis methodology for deep learning.☆39Updated 5 years ago
- Visualize TVM Relay program graph☆12Updated 5 years ago
- System for automated integration of deep learning backends.☆47Updated 2 years ago
- PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections☆120Updated 2 years ago
- TVM stack: exploring the incredible explosion of deep-learning frameworks and how to bring them together☆64Updated 6 years ago
- The quantitative performance comparison among DL compilers on CNN models.☆74Updated 4 years ago
- ☆23Updated last year
- Boost hardware utilization for ML training workloads via Inter-model Horizontal Fusion☆32Updated 11 months ago
- A home for the final text of all TVM RFCs.☆102Updated 7 months ago
- DietCode Code Release☆63Updated 2 years ago
- Issues related to MLPerf™ Inference policies, including rules and suggested changes☆62Updated 2 months ago
- ☆69Updated 2 years ago
- A tool for examining GPU scheduling behavior.☆81Updated 8 months ago
- ☆23Updated 5 months ago
- Automatic Schedule Exploration and Optimization Framework for Tensor Computations☆176Updated 3 years ago
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- GVProf: A Value Profiler for GPU-based Clusters☆49Updated last year
- Artifacts for SOSP'19 paper Optimizing Deep Learning Computation with Automatic Generation of Graph Substitutions☆21Updated 3 years ago
- Benchmark scripts for TVM☆74Updated 3 years ago
- SparseTIR: Sparse Tensor Compiler for Deep Learning☆135Updated 2 years ago
- TVM learning and research☆13Updated 4 years ago
- An IR for efficiently simulating distributed ML computation.☆28Updated last year
- ☆47Updated 2 years ago
- [MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration☆197Updated 3 years ago
- ☆11Updated 4 years ago