PanZaifeng / RecFlexLinks

A recommendation model kernel optimizing system

☆12

Alternatives and similar repositories for RecFlex

Users that are interested in RecFlex are comparing it to the libraries listed below

Sorting:

PanZaifeng / FastTree-Artifact
☆27Updated 7 months ago
apuaaChen / EVT_AE
Artifacts of EVT ASPLOS'24
☆28Updated last year
hao-ai-lab / MuxServe
☆79Updated last month
DachengLi1 / AMP
(NeurIPS 2022) Automatically finding good model-parallel strategies, especially for complex models and clusters.
☆43Updated 3 years ago
Azure / msccl
Microsoft Collective Communication Library
☆66Updated 11 months ago
infinigence / FlashOverlap
A lightweight design for computation-communication overlap.
☆187Updated last month
AlibabaResearch / recom
An Optimizing Compiler for Recommendation Model Inference
☆26Updated 5 months ago
alibaba / llm-scheduling-artifact
Artifact of OSDI '24 paper, ”Llumnix: Dynamic Scheduling for Large Language Model Serving“
☆63Updated last year
ByteDance-Seed / StragglerAnalysis
☆43Updated 6 months ago
ParCIS / Chimera
Chimera: bidirectional pipeline parallelism for efficiently training large-scale models.
☆68Updated 8 months ago
zhuohan123 / terapipe
☆77Updated 4 years ago
awslabs / optimizing-multitask-training-through-dynamic-pipelines
Official repository for the paper DynaPipe: Optimizing Multi-task Training through Dynamic Pipelines
☆20Updated last year
google / iopddl
Supplemental materials for The ASPLOS 2025 / EuroSys 2025 Contest on Intra-Operator Parallelism for Distributed Deep Learning
☆23Updated 6 months ago
jiazhihao / attention_superoptimizer
An Attention Superoptimizer
☆22Updated 10 months ago
mcrl / tccl
Thunder Research Group's Collective Communication Library
☆42Updated 4 months ago
WukLab / preble
Stateful LLM Serving
☆88Updated 8 months ago
AlibabaResearch / flash-llm
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
☆223Updated 2 years ago
YukeWang96 / MGG_OSDI23
Artifact for OSDI'23: MGG: Accelerating Graph Neural Networks with Fine-grained intra-kernel Communication-Computation Pipelining on Mult…
☆40Updated last year
microsoft / nnscaler
nnScaler: Compiling DNN models for Parallel Training
☆119Updated last month
Infrawaves / DeepEP_ibrc_dual-ports_multiQP
Aims to implement dual-port and multi-qp solutions in deepEP ibrc transport
☆66Updated 6 months ago
LoongServe / LoongServe
☆124Updated last year
zhaiyi000 / tlp
☆41Updated last year
kungfu-team / tenplex
Dynamic resources changes for multi-dimensional parallelism training
☆29Updated 2 months ago
microsoft / SuperScaler
An experimental parallel training platform
☆56Updated last year
SJTU-IPADS / disb
DISB is a new DNN inference serving benchmark with diverse workloads and models, as well as real-world traces.
☆56Updated last year
meta-pytorch / KernelAgent
Autonomous GPU Kernel Generation via Deep Agents
☆123Updated last week
Raphael-Hao / brainstorm
Compiler for Dynamic Neural Networks
☆46Updated 2 years ago
KuangjuX / NVSHMEM-Tutorial
NVSHMEM‑Tutorial: Build a DeepEP‑like GPU Buffer
☆143Updated 2 months ago
DeepLink-org / DLSlime
DLSlime: Flexible & Efficient Heterogeneous Transfer Toolkit
☆82Updated this week
SymbioticLab / Oobleck
A resilient distributed training framework
☆96Updated last year