UbiquitousLearning / Mandheling-DSP-TrainingLinks

The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]

☆19

Alternatives and similar repositories for Mandheling-DSP-Training

Users that are interested in Mandheling-DSP-Training are comparing it to the libraries listed below

Sorting:

apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆28Updated last year
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
pku-liang / AMOS
Automatic Mapping Generation, Verification, and Exploration for ISA-based Spatial Accelerators
☆115Updated 3 years ago
microsoft / SparTA
☆154Updated last year
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆92Updated 2 years ago
UofT-EcoSystem / DietCode
DietCode Code Release
☆65Updated 3 years ago
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated last year
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆122Updated 3 years ago
marsupialtail / gpu-sparsert
☆18Updated 5 years ago
csu-eis / CoDL
☆78Updated 2 years ago
zhaiyi000 / tlm
☆45Updated last year
tlc-pack / tenset
☆93Updated 2 years ago
parasailteam / coconet
☆83Updated 2 years ago
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆125Updated last week
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆94Updated 3 months ago
lixiuhong / batched_gemm
☆39Updated 5 years ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆38Updated 7 months ago
sunlex0717 / DissectingTensorCores
☆109Updated last year
uwsampl / SparseTIR
SparseTIR: Sparse Tensor Compiler for Deep Learning
☆138Updated 2 years ago
apuaaChen / vectorSparse
☆32Updated 3 years ago
nox-410 / tvm.tl
An extention of TVMScript to write simple and high performance GPU kernels with tensorcore.
☆51Updated last year
PrincetonUniversity / LLMCompass
☆199Updated last week
zhaiyi000 / tlp
☆41Updated last year
hgyhungry / ShflBW_Sparse_NN
☆16Updated 2 years ago
hgyhungry / alcop-artifact
☆23Updated 2 years ago
pku-liang / ArkVale
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction (NIPS'24)
☆43Updated 10 months ago
ConnollyLeon / awesome-Auto-Parallelism
A baseline repository of Auto-Parallelism in Training Neural Networks
☆147Updated 3 years ago
AlibabaResearch / mononn
☆31Updated last year
EfficientLLMSys / MuxServe
☆13Updated last year