intel / llm-on-rayLinks
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆127Updated last month
Alternatives and similar repositories for llm-on-ray
Users that are interested in llm-on-ray are comparing it to the libraries listed below
Sorting:
- ☆53Updated 8 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆67Updated last year
- Efficient and easy multi-instance LLM serving☆420Updated this week
- ☆260Updated 2 weeks ago
- LLM Serving Performance Evaluation Harness☆78Updated 3 months ago
- A low-latency & high-throughput serving engine for LLMs☆370Updated this week
- ☆193Updated 3 weeks ago
- Benchmark suite for LLMs from Fireworks.ai☆75Updated 2 weeks ago
- ☆46Updated 11 months ago
- ☆52Updated 6 months ago
- Materials for learning SGLang☆424Updated last week
- ☆118Updated last year
- ☆85Updated 2 months ago
- Modular and structured prompt caching for low-latency LLM inference☆94Updated 6 months ago
- Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …☆58Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆100Updated last year
- The driver for LMCache core to run in vLLM☆41Updated 3 months ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆378Updated last month
- A high-throughput and memory-efficient inference and serving engine for LLMs☆73Updated this week
- Perplexity GPU Kernels☆324Updated last week
- ☆49Updated 2 months ago
- Stateful LLM Serving☆70Updated 2 months ago
- NVIDIA NCCL Tests for Distributed Training☆91Updated last week
- ☆99Updated this week
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆157Updated 8 months ago
- A throughput-oriented high-performance serving framework for LLMs☆814Updated 3 weeks ago
- PyTorch distributed training acceleration framework☆49Updated 3 months ago
- Ray - A curated list of resources: https://github.com/ray-project/ray☆60Updated 4 months ago
- KV cache store for distributed LLM inference☆250Updated this week
- Easy and Efficient Quantization for Transformers☆198Updated 3 months ago