parrotsky / AutoDiCELinks
distributed CNN inference at the edge, extend ncnn with CUDA, MPI+OPENMP support.
☆18Updated last year
Alternatives and similar repositories for AutoDiCE
Users that are interested in AutoDiCE are comparing it to the libraries listed below
Sorting:
- A Portable C Library for Distributed CNN Inference on IoT Edge Clusters☆82Updated 5 years ago
- PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices☆33Updated last year
- This is a list of awesome edgeAI inference related papers.☆96Updated last year
- ☆40Updated 4 years ago
- Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"☆22Updated 4 years ago
- To deploy Transformer models in CV to mobile devices.☆18Updated 3 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆18Updated 2 years ago
- ☆39Updated 5 years ago
- ☆14Updated 3 years ago
- The code for paper: Neuralpower: Predict and deploy energy-efficient convolutional neural networks☆21Updated 5 years ago
- Multi-branch model for concurrent execution☆17Updated last year
- ☆36Updated 6 years ago
- ☆14Updated 3 years ago
- MobiSys#114☆21Updated last year
- Experimental deep learning framework written in Rust☆14Updated 2 years ago
- ☆19Updated 3 years ago
- Official implementation for ECCV 2022 paper LIMPQ - "Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance"☆54Updated 2 years ago
- This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).☆13Updated 4 years ago
- Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".☆20Updated last year
- CPrune: Compiler-Informed Model Pruning for Efficient Target-Aware DNN Execution☆17Updated last year
- CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices☆43Updated 5 years ago
- [CVPRW 2021] Dynamic-OFA: Runtime DNN Architecture Switching for Performance Scaling on Heterogeneous Embedded Platforms☆29Updated 2 years ago
- A Winograd Minimal Filter Implementation in CUDA☆24Updated 3 years ago
- ☆18Updated 4 years ago
- ☆23Updated this week
- Create tiny ML systems for on-device learning.☆20Updated 3 years ago
- An external memory allocator example for PyTorch.☆14Updated 3 years ago
- This is the open-source version of TinyTS. The code is dirty so far. We may clean the code in the future.☆17Updated 10 months ago
- Ok-Topk is a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k c…☆26Updated 2 years ago
- MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)☆51Updated last year