parrotsky / AutoDiCELinks
distributed CNN inference at the edge, extend ncnn with CUDA, MPI+OPENMP support.
☆22Updated 5 months ago
Alternatives and similar repositories for AutoDiCE
Users that are interested in AutoDiCE are comparing it to the libraries listed below
Sorting:
- This is a list of awesome edgeAI inference related papers.☆98Updated 2 years ago
- ☆78Updated 2 years ago
- [MobiCom 24] Efficient and Adaptive DNN inference under changeable memory budgets☆58Updated last year
- Multi-branch model for concurrent execution☆18Updated 2 years ago
- PipeEdge: Pipeline Parallelism for Large-Scale Model Inference on Heterogeneous Edge Devices☆37Updated 2 years ago
- MobiSys#114☆23Updated 2 years ago
- An external memory allocator example for PyTorch.☆16Updated 5 months ago
- A Portable C Library for Distributed CNN Inference on IoT Edge Clusters☆88Updated 5 years ago
- Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"☆22Updated 5 years ago
- To deploy Transformer models in CV to mobile devices.☆18Updated 4 years ago
- Quantize pytorch model, support post-training quantization and quantization aware training methods☆14Updated 2 years ago
- Official implementation for ECCV 2022 paper LIMPQ - "Mixed-Precision Neural Network Quantization via Learned Layer-wise Importance"☆61Updated 2 years ago
- A Winograd Minimal Filter Implementation in CUDA☆28Updated 4 years ago
- PyTorch compilation tutorial covering TorchScript, torch.fx, and Slapo☆17Updated 2 years ago
- The code for paper: Neuralpower: Predict and deploy energy-efficient convolutional neural networks☆23Updated 6 years ago
- ☆40Updated 5 years ago
- zTT: Learning-based DVFS with Zero Thermal Throttling for Mobile Devices [MobiSys'21] - Artifact Evaluation☆28Updated 4 years ago
- [DAC 2024] EDGE-LLM: Enabling Efficient Large Language Model Adaptation on Edge Devices via Layerwise Unified Compression and Adaptive La…☆78Updated last year
- TQT's pytorch implementation.☆21Updated 4 years ago
- Optimize tensor program fast with Felix, a gradient descent autotuner.☆30Updated last year
- CUDA project for uni subject☆26Updated 5 years ago
- Penn CIS 5650 (GPU Programming and Architecture) Final Project☆44Updated 2 years ago
- My study note for mlsys☆15Updated last year
- Jetson embedded platform-target deep learning inference acceleration framework with TensorRT☆29Updated 3 months ago
- This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).☆14Updated 4 years ago
- ☆41Updated 5 years ago
- A Out-of-box PyTorch Scaffold for Neural Network Quantization-Aware-Training (QAT) Research. Website: https://github.com/zhutmost/neuralz…☆25Updated 3 years ago
- SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs☆61Updated 10 months ago
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆36Updated 5 months ago
- Experimental deep learning framework written in Rust☆15Updated 3 years ago