eis-lab / sageLinks

Experimental deep learning framework written in Rust

☆15

Alternatives and similar repositories for sage

Users that are interested in sage are comparing it to the libraries listed below

Sorting:

SNU-ARC / any-precision-llm
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
☆111Updated last week
casys-kaist / EnvPipe
☆25Updated last year
qipengwang / Melon
MobiSys#114
☆21Updated last year
naver-aics / lut-gemm
☆61Updated last year
Kyrie-Zhao / awesome-real-time-AI
This is a list of awesome edgeAI inference related papers.
☆96Updated last year
VIA-Research / vTrain
☆73Updated last month
microsoft / SparTA
☆149Updated 11 months ago
Sys-KU / DeepPlan
[ACM EuroSys '23] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆57Updated last year
Qualcomm-AI-research / gptvq
☆32Updated last year
S-Lab-System-Group / Awesome-ML-for-System
SOTA Learning-augmented Systems
☆36Updated 3 years ago
ranggihwang / Pregated_MoE
☆48Updated last year
hongsunjang / pipe-bd
[DATE 2023] Pipe-BD: Pipelined Parallel Blockwise Distillation
☆11Updated 2 years ago
ConstantPark / DL_Compiler
Study Group of Deep Learning Compiler
☆161Updated 2 years ago
xxyux / SpInfer
SpInfer: Leveraging Low-Level Sparsity for Efficient Large Language Model Inference on GPUs
☆50Updated 3 months ago
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆40Updated 2 years ago
limenghao / AdaTune
This is the implementation for paper: AdaTune: Adaptive Tensor Program CompilationMade Efficient (NeurIPS 2020).
☆14Updated 4 years ago
xvyaward / owq
Code for the AAAI 2024 Oral paper "OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and Inference of Large Language Model…
☆63Updated last year
UbiquitousLearning / Paper-list-resource-efficient-large-language-model
☆100Updated last year
HuaizhengZhang / MIGProfiler
Multi-Instance-GPU profiling tool
☆60Updated 2 years ago
Soroosh129 / NeuOS
Source code for the paper: "A Latency-Predictable Multi-Dimensional Optimization Framework forDNN-driven Autonomous Systems"
☆22Updated 4 years ago
XRBench / XRBench-MLSys2023
A version of XRBench-MAESTRO used for MLSys 2023 publication
☆24Updated 2 years ago
LiuXiaoxuanPKU / GACT-ICML
☆42Updated 2 years ago
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆123Updated last month
ryujaehun / one-shot-tuner
one-shot-tuner
☆8Updated 2 years ago
learning1234embed / NeuralWeightVirtualization
[MobiSys 2020] Fast and Scalable In-memory Deep Multitask Learning via Neural Weight Virtualization
☆16Updated 5 years ago
anony-sub / chameleon
Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation
☆27Updated 5 years ago
UofT-EcoSystem / DietCode
DietCode Code Release
☆64Updated 2 years ago
tlc-pack / tenset
☆92Updated 2 years ago
sjtu-epcc / DVABatch
☆19Updated 3 years ago
bytedance / QSync
Official resporitory for "IPDPS' 24 QSync: Quantization-Minimized Synchronous Distributed Training Across Hybrid Devices".
☆20Updated last year