ray-project / Ray-ConnectLinks

Material for Ray Connect 2024 Conference

☆12

Alternatives and similar repositories for Ray-Connect

Users that are interested in Ray-Connect are comparing it to the libraries listed below

Sorting:

LMCache / lmcache-vllm
The driver for LMCache core to run in vLLM
☆54Updated 8 months ago
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆131Updated last month
yale-sys / prompt-cache
Modular and structured prompt caching for low-latency LLM inference
☆101Updated 11 months ago
antgroup / ant-ray
Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads. AntRay i…
☆153Updated this week
project-etalon / etalon
LLM Serving Performance Evaluation Harness
☆79Updated 8 months ago
omni-ai-npu / omni-infer
Omni_Infer is a suite of inference accelerators designed for the Ascend NPU platform, offering native support and an expanding feature se…
☆80Updated this week
bentoml / llm-bench
☆56Updated 11 months ago
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆222Updated this week
ovg-project / kvcached
Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond
☆104Updated last week
tyler-griggs / melange-release
☆47Updated last year
WukLab / preble
Stateful LLM Serving
☆87Updated 7 months ago
microsoft / sarathi-serve
A low-latency & high-throughput serving engine for LLMs
☆431Updated last week
bytedance / InfiniStore
KV cache store for distributed LLM inference
☆346Updated last month
anyscale / llm-continuous-batching-benchmarks
☆121Updated last year
alibaba / TePDist
TePDist (TEnsor Program DISTributed) is an HLO-level automatic distributed system for DL models.
☆97Updated 2 years ago
AlibabaPAI / torchacc
PyTorch distributed training acceleration framework
☆53Updated 2 months ago
AlibabaPAI / llumnix
Efficient and easy multi-instance LLM serving
☆502Updated last month
InternLM / turbomind
☆97Updated 7 months ago
sgl-project / sgl-learning-materials
Materials for learning SGLang
☆618Updated 3 weeks ago
feifeibear / LLMRoofline
Compare different hardware platforms via the Roofline Model for LLM inference tasks.
☆116Updated last year
run-ai / llmperf
☆58Updated last year
sgl-project / sglang-jax
JAX backend for SGL
☆78Updated this week
OpenSQZ / MegatronApp
Toolchain built around the Megatron-LM for Distributed Training
☆67Updated last week
infinigence / Semi-PD
A prefill & decode disaggregated LLM serving framework with shared GPU memory and fine-grained compute isolation.
☆113Updated 5 months ago
ray-project / mobius
Mobius is an AI infrastructure platform for distributed online learning, including online sample processing, training and serving.
☆100Updated last year
LMCache / demo
☆26Updated 6 months ago
MooreThreads / TurboRAG
☆82Updated 11 months ago
zhaochenyang20 / ModelServer
Efficient, Flexible, and Highly Fault-Tolerant Model Service Management Based on SGLang
☆58Updated 11 months ago
tensorchord / modelz-ChatGLM
Deploy ChatGLM on Modelz
☆16Updated 2 years ago
microsoft / ParrotServe
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
☆188Updated last year