casys-kaist / casys-kaist.github.ioLinks

☆16

Alternatives and similar repositories for casys-kaist.github.io

Users that are interested in casys-kaist.github.io are comparing it to the libraries listed below

Sorting:

ranggihwang / Pregated_MoE
☆53Updated last year
casys-kaist / LLMServingSim
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale
☆142Updated 2 months ago
snu-comparch / InfiniGen
InfiniGen: Efficient Generative Inference of Large Language Models with Dynamic KV Cache Management (OSDI'24)
☆154Updated last year
VIA-Research / vTrain
☆74Updated 4 months ago
abhibambhaniya / GenZ-LLM-Analyzer
LLM Inference analyzer for different hardware platforms
☆94Updated 2 months ago
mutinifni / splitwise-sim
LLM serving cluster simulator
☆115Updated last year
PrincetonUniversity / LLMCompass
☆189Updated last year
SJTU-ReArch-Group / Paper-Reading-List
☆129Updated this week
HPMLL / DTC-SpMM_ASPLOS24
☆37Updated last year
sitar-lab / NeuSight
☆52Updated 3 months ago
hgyhungry / ShflBW_Sparse_NN
☆16Updated 2 years ago
HPMLL / SpInfer_EuroSys25
☆24Updated 6 months ago
casys-kaist / NeuPIMs
NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing
☆94Updated last year
eth-easl / orion
An interference-aware scheduler for fine-grained GPU sharing
☆147Updated 8 months ago
goliaro / specinfer-ae
☆24Updated last year
ParCIS / Magicube
Magicube is a high-performance library for quantized sparse matrix operations (SpMM and SDDMM) of deep learning on Tensor Cores.
☆89Updated 2 years ago
atomicapple0 / libsmctrl
Artifact from "Hardware Compute Partitioning on NVIDIA GPUs". THIS IS A FORK OF BAKITAS REPO
☆34Updated last year
CRAFT-THU / RoDe
A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUs
☆24Updated last year
VIA-Research / dpsgd_profiler
☆20Updated 9 months ago
LoongServe / LoongServe
☆122Updated 10 months ago
monellz / FlashTensor
☆15Updated 7 months ago
calculon-ai / calculon
☆154Updated last year
Sys-KU / DeepPlan
[ACM EuroSys 2023] Fast and Efficient Model Serving Using Multi-GPUs with Direct-Host-Access
☆57Updated 2 months ago
parasailteam / coconet
☆83Updated 2 years ago
UDC-GAC / venom
A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
☆53Updated last year
MachineLearningSystem / 25ASPLOS-Medusa
Medusa: Accelerating Serverless LLM Inference with Materialization [ASPLOS'25]
☆11Updated 10 months ago
microsoft / SparTA
☆151Updated last year
DicardoX / Research-Space
This repository is established to store personal notes and annotated papers during daily research.
☆154Updated this week
reed-lau / cute-gemm
☆135Updated 10 months ago
uchuhimo / amanda
☆18Updated last year