nnperfwins / nnPerf

☆11

Alternatives and similar repositories for nnPerf:

Users that are interested in nnPerf are comparing it to the libraries listed below

UbiquitousLearning / Mandheling-DSP-Training
The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]
☆18Updated 2 years ago
csu-eis / CoDL
☆77Updated last year
pittisl / ElasticTrainer
Code for paper "ElasticTrainer: Speeding Up On-Device Training with Runtime Elastic Tensor Selection" (MobiSys'23)
☆12Updated last year
learning1234embed / NeuralWeightVirtualization
[MobiSys 2020] Fast and Scalable In-memory Deep Multitask Learning via Neural Weight Virtualization
☆16Updated 4 years ago
MegEngine / mperf
mperf是一个面向移动/嵌入式平台的算子性能调优工具箱
☆179Updated last year
Kyrie-Zhao / awesome-real-time-AI
This is a list of awesome edgeAI inference related papers.
☆95Updated last year
qipengwang / Melon
MobiSys#114
☆21Updated last year
xudoong / EdgeVisionTransformer
To deploy Transformer models in CV to mobile devices.
☆17Updated 3 years ago
UbiquitousLearning / MobileDLFrameworksBenchmark
☆19Updated 3 years ago
xumengwei / Edge-AI-Paper-List
☆199Updated last year
theyoungkwon / TinyTrain
The official implementation of TinyTrain [ICML '24]
☆21Updated 8 months ago
mrsnu / band
Multi-DNN Inference Engine for Heterogeneous Mobile Processors
☆31Updated 8 months ago
ztt-21 / zTT
zTT: Learning-based DVFS with Zero Thermal Throttling for Mobile Devices [MobiSys'21] - Artifact Evaluation
☆24Updated 3 years ago
tigert1998 / qat
Manually implemented quantization-aware training
☆21Updated 2 years ago
weishengying / tiny-flash-attention
使用 cutlass 实现 flash-attention 精简版，具有教学意义
☆38Updated 7 months ago
INT-FlashAttention2024 / INT-FlashAttention
☆68Updated 2 months ago
xxxxyu / FlexNN
Code for ACM MobiCom 2024 paper "FlexNN: Efficient and Adaptive DNN Inference on Memory-Constrained Edge Devices"
☆51Updated 2 months ago
ademeure / DeeperGEMM
DeeperGEMM: crazy optimized version
☆63Updated 2 weeks ago
Qualcomm-AI-research / FP8-quantization
☆141Updated 2 years ago
microsoft / nn-Meter
A DNN inference latency prediction toolkit for accurately modeling and predicting the latency on diverse edge devices.
☆345Updated 8 months ago
IntelLabs / FP8-Emulation-Toolkit
PyTorch extension for emulating FP8 data formats on standard FP32 Xeon/GPU hardware.
☆107Updated 4 months ago
xlite-dev / hgemm-tensorcores-mma
⚡️Write HGEMM from scratch using Tensor Cores with WMMA, MMA and CuTe API, Achieve Peak⚡️ Performance.
☆65Updated this week
tonyzhao-jt / LLM-PQ
Official Repo for "LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive Quantization"
☆29Updated last year
mit-han-lab / tinychat-tutorial
☆61Updated 4 months ago
snap-research / F8Net
[ICLR 2022 Oral] F8Net: Fixed-Point 8-bit Only Multiplication for Network Quantization
☆94Updated 2 years ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆58Updated 11 months ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆117Updated this week
marsupialtail / gpu-sparsert
☆17Updated 4 years ago
eliberis / tflite-tools
TFLite model analyzer & memory optimizer
☆124Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆103Updated last month