intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆108Updated 2 months ago
Alternatives and similar repositories for llm-on-ray:
Users that are interested in llm-on-ray are comparing it to the libraries listed below
- ☆52Updated 4 months ago
- Materials for learning SGLang☆166Updated last week
- ☆41Updated last month
- Efficient and easy multi-instance LLM serving☆276Updated this week
- LLM Serving Performance Evaluation Harness☆65Updated 4 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆60Updated 9 months ago
- ☆43Updated 6 months ago
- ☆215Updated this week
- A low-latency & high-throughput serving engine for LLMs☆296Updated 4 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆47Updated this week
- ☆114Updated 10 months ago
- A large-scale simulation framework for LLM inference☆310Updated last month
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆88Updated 10 months ago
- Benchmark suite for LLMs from Fireworks.ai☆64Updated last month
- Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated last month
- Modular and structured prompt caching for low-latency LLM inference☆83Updated 2 months ago
- ☆150Updated this week
- Tune efficiently any LLM model from HuggingFace using distributed training (multiple GPU) and DeepSpeed. Uses Ray AIR to orchestrate the …☆53Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆272Updated last month
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆236Updated 10 months ago
- NVIDIA NCCL Tests for Distributed Training☆78Updated last month
- ☆167Updated 3 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆280Updated last month
- Making Long-Context LLM Inference 10x Faster and 10x Cheaper☆361Updated this week
- A collection of all available inference solutions for the LLMs☆74Updated 4 months ago
- CUDA checkpoint and restore utility☆264Updated 9 months ago
- PyTorch distributed training acceleration framework☆38Updated this week
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆109Updated 10 months ago
- ☆392Updated this week
- A throughput-oriented high-performance serving framework for LLMs☆692Updated 3 months ago