caoting-dotcom / multiBranchModelLinks

Multi-branch model for concurrent execution

☆17

Alternatives and similar repositories for multiBranchModel

Users that are interested in multiBranchModel are comparing it to the libraries listed below

Sorting:

csu-eis / CoDL
☆78Updated 2 years ago
LeiWang1999 / tvm_gpu_gemm
play gemm with tvm
☆92Updated 2 years ago
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆54Updated last year
Kyrie-Zhao / awesome-real-time-AI
This is a list of awesome edgeAI inference related papers.
☆98Updated last year
UbiquitousLearning / Mandheling-DSP-Training
The open-source project for "Mandheling: Mixed-Precision On-Device DNN Training with DSP Offloading"[MobiCom'2022]
☆19Updated 3 years ago
thu-pacman / PET
PET: Optimizing Tensor Programs with Partially Equivalent Transformations and Automated Corrections
☆122Updated 3 years ago
pku-liang / MAGIS
MAGIS: Memory Optimization via Coordinated Graph Transformation and Scheduling for DNN (ASPLOS'24)
☆55Updated last year
qipengwang / Melon
MobiSys#114
☆22Updated 2 years ago
OpenPPL / CuAssembler
An unofficial cuda assembler, for all generations of SASS, hopefully ：）
☆83Updated 2 years ago
mit-han-lab / inter-operator-scheduler
[MLSys 2021] IOS: Inter-Operator Scheduler for CNN Acceleration
☆199Updated 3 years ago
xinetzone / tvm-book
☆18Updated this week
billmuch / matmul_perf_test
☆14Updated 3 years ago
humuyan / Korch
ASPLOS'24: Optimal Kernel Orchestration for Tensor Programs with Korch
☆37Updated 6 months ago
LeiWang1999 / TVM.CMakeExtend
Tutorials of Extending and importing TVM with CMAKE Include dependency.
☆14Updated last year
UofT-EcoSystem / DietCode
DietCode Code Release
☆65Updated 3 years ago
SJTU-IPADS / reef-artifacts
A GPU-accelerated DNN inference serving system that supports instant kernel preemption and biased concurrent execution in GPU scheduling.
☆43Updated 3 years ago
gfvvz / triton-learning-materials
Triton Compiler related materials.
☆35Updated 9 months ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆62Updated last year
tfruan2000 / mlsys-study-note
My study note for mlsys
☆16Updated 11 months ago
lenLRX / AmpereSparseMatmul
study of Ampere' Sparse Matmul
☆18Updated 4 years ago
OpenPPL / ppl.kernel.cpu
☆18Updated last year
tlc-pack / cutlass_fpA_intB_gemm
A standalone GEMM kernel for fp16 activation and quantized weight, extracted from FasterTransformer
☆94Updated last month
MARD1NO / CUDA-PPT
☆109Updated 6 months ago
AlibabaResearch / recom
An Optimizing Compiler for Recommendation Model Inference
☆26Updated 4 months ago
summerspringwei / souffle-ae
☆18Updated last year
Raphael-Hao / Abacus
☆38Updated 3 months ago
pku-liang / FlexTensor
Automatic Schedule Exploration and Optimization Framework for Tensor Computations
☆180Updated 3 years ago
lixiuhong / batched_gemm
☆39Updated 5 years ago
Mengjintao / FastCNN
☆20Updated 3 years ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆26Updated last year