icloud-ecnu / OparaLinks

Opara is a lightweight and resource-aware DNN Operator parallel scheduling framework to accelerate the execution of DNN inference on GPUs.

☆23

Alternatives and similar repositories for Opara

Users that are interested in Opara are comparing it to the libraries listed below

Sorting:

icloud-ecnu / spotDNN
spotDNN is a heterogeneity-aware spot instance provisioning framework to provide predictable performance for DDNN training workloads in t…
☆15Updated 2 years ago
icloud-ecnu / igniter
iGniter, an interference-aware GPU resource provisioning framework for achieving predictable performance of DNN inference in the cloud.
☆38Updated last year
icloud-ecnu / ispot
iSpot is a lightweight and cost-effective instance provisioning framework for Directed Acyclic Graph (DAG)-style big data analytics, in …
☆11Updated 2 years ago
icloud-ecnu / delaystage
DelayStage is a simple yet effective stage delay scheduling strategy to interleave the cluster resources across the parallel stages, so a…
☆14Updated 2 years ago
icloud-ecnu / ebrowser
ebrowser, an energy-efficient and lightweight human interaction framework without degrading the user experience in mobile Web browsers.
☆12Updated 2 years ago
icloud-ecnu / paper-reading-list
Reading paper list for iCloud group
☆14Updated last week
icloud-ecnu / Prophet
Prophet is a predictable communication scheduling strategy to schedule the gradient transfer in an adequate order, with the aim of maximi…
☆16Updated 2 years ago
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆155Updated 2 weeks ago
icloud-ecnu / CCC2023
☆12Updated 2 years ago
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆147Updated 8 months ago
LLMServe / SwiftTransformer
High performance Transformer implementation in C++.
☆135Updated 9 months ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆155Updated last year
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆181Updated last week
chenhongyu2048 / LLM-inference-optimization-paper
Summary of some awesome work for optimizing LLM inference
☆116Updated 4 months ago
DD-DuDa / Cute-Learning
Examples of CUDA implementations by Cutlass CuTe
☆241Updated 3 months ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆138Updated last month
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆118Updated 3 weeks ago
reed-lau / cute-gemm
☆136Updated 10 months ago
flexflow / flexflow-serve
FlexFlow Serve: Low-Latency, High-Performance LLM Serving
☆63Updated last month
icloud-ecnu / lambdadnn
λDNN is a cost-efficient function resource provisioning framework to minimize the monetary cost and guarantee the performance for DDNN tr…
☆23Updated last year
mental2008 / awesome-papers
Here are my personal paper reading notes (including cloud computing, resource management, systems, machine learning, deep learning, and o…
☆126Updated this week
galeselee / Awesome_LLM_System-PaperList
Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of pap…
☆278Updated 7 months ago
microsoft / SparTA
☆152Updated last year
UMass-LIDS / Proteus
Proteus: A High-Throughput Inference-Serving System with Accuracy Scaling
☆13Updated last year
mutinifni / splitwise-sim
LLM serving cluster simulator
☆116Updated last year
parasailteam / coconet
☆83Updated 2 years ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆66Updated 7 months ago
eniac / paella
Paella: Low-latency Model Serving with Virtualized GPU Scheduling
☆62Updated last year
byungsoo-oh / ml-systems-papers
Curated collection of papers in machine learning systems
☆426Updated 2 weeks ago
monellz / FlashTensor
☆16Updated 7 months ago