nnperfwins / nnPerf
☆11Updated 11 months ago
Alternatives and similar repositories for nnPerf:
Users that are interested in nnPerf are comparing it to the libraries listed below
- The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]☆18Updated 2 years ago
- ☆77Updated last year
- Code for paper "ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection" (MobiSys'23)☆12Updated last year
- [MobiSys 2020] Fast and Scalable In-memory Deep Multitask Learning via Neural Weight Virtualization☆16Updated 4 years ago
- mperf是一个面向移动/嵌入式平台的算子性能调优工具箱☆179Updated last year
- This is a list of awesome edgeAI inference related papers.☆95Updated last year
- MobiSys#114☆21Updated last year
- To deploy Transformer models in CV to mobile devices.☆17Updated 3 years ago
- ☆19Updated 3 years ago
- ☆199Updated last year
- The official implementation of TinyTrain [ICML '24]☆21Updated 8 months ago
- Multi-DNN Inference Engine for Heterogeneous Mobile Processors☆31Updated 8 months ago
- zTT: Learning-based DVFS with Zero Thermal Throttling for Mobile Devices [MobiSys'21] - Artifact Evaluation☆24Updated 3 years ago
- Manually implemented quantization-aware training☆21Updated 2 years ago
- 使用 cutlass 实现 flash-attention 精简版,具有教学意义☆38Updated 7 months ago
- ☆68Updated 2 months ago
- Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"☆51Updated 2 months ago
- DeeperGEMM: crazy optimized version☆63Updated 2 weeks ago
- ☆141Updated 2 years ago
- A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.☆345Updated 8 months ago
- PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.☆107Updated 4 months ago
- ⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.☆65Updated this week
- Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"☆29Updated last year
- ☆61Updated 4 months ago
- [ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization☆94Updated 2 years ago
- Paella: Low-latency Model Serving with Virtualized GPU Scheduling☆58Updated 11 months ago
- This repository is established to store personal notes and annotated papers during daily research.☆117Updated this week
- ☆17Updated 4 years ago
- TFLite model analyzer & memory optimizer☆124Updated last year
- nnScaler: Compiling DNN models for Parallel Training☆103Updated last month