intel / llm-on-rayLinks
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆129Updated last month
Alternatives and similar repositories for llm-on-ray
Users that are interested in llm-on-ray are comparing it to the libraries listed below
Sorting:
- ☆55Updated 9 months ago
- LLM Serving Performance Evaluation Harness☆78Updated 3 months ago
- A low-latency & high-throughput serving engine for LLMs☆379Updated 3 weeks ago
- The driver for LMCache core to run in vLLM☆41Updated 4 months ago
- ☆267Updated last week
- Efficient and easy multi-instance LLM serving☆437Updated this week
- ☆54Updated 7 months ago
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆100Updated last year
- Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …☆58Updated 2 years ago
- NVIDIA NCCL Tests for Distributed Training☆97Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆76Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆76Updated 2 weeks ago
- A large-scale simulation framework for LLM inference☆382Updated 7 months ago
- ☆47Updated 11 months ago
- Materials for learning SGLang☆443Updated this week
- Perplexity GPU Kernels☆364Updated last week
- ☆86Updated 2 months ago
- KV cache store for distributed LLM inference☆269Updated 2 weeks ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆396Updated 3 weeks ago
- ☆155Updated this week
- Large Language Model Text Generation Inference on Habana Gaudi☆33Updated 3 months ago
- Fast and memory-efficient exact attention☆74Updated this week
- ☆26Updated 3 months ago
- NVIDIA Inference Xfer Library (NIXL)☆413Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆116Updated 6 months ago
- ☆119Updated last year
- ☆221Updated this week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆67Updated last year
- Disaggregated serving system for Large Language Models (LLMs).☆617Updated 2 months ago
- ☆194Updated last month