Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Sep 23, 2025Updated 5 months ago
Alternatives and similar repositories for llm-on-ray
Users that are interested in llm-on-ray are comparing it to the libraries listed below
Sorting:
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,267Mar 13, 2025Updated 11 months ago
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆14Jan 8, 2026Updated last month
- RayDP provides simple APIs for running Spark on Ray and integrating Spark with AI libraries.☆368Feb 1, 2026Updated last month
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆78Apr 6, 2024Updated last year
- GLake: optimizing GPU memory management and IO transmission.☆498Mar 24, 2025Updated 11 months ago
- ☆13Jan 7, 2025Updated last year
- Coq集合模型论☆11Aug 18, 2022Updated 3 years ago
- ☆11Mar 13, 2023Updated 2 years ago
- ☆130Dec 24, 2024Updated last year
- Python tools☆14Oct 22, 2023Updated 2 years ago
- AutoML 2024: HPOD: Hyperparameter Optimization for Unsupervised Outlier Detection☆12Jul 12, 2024Updated last year
- ☆47Jun 27, 2024Updated last year
- llama INT4 cuda inference with AWQ☆54Jan 20, 2025Updated last year
- Yet another coding assistant powered by LLM.☆16Sep 11, 2024Updated last year
- Kubernetes operator providing Ray|Spark|Dask|MPI clusters on-demand☆15Oct 26, 2023Updated 2 years ago
- SpotServe: Serving Generative Large Language Models on Preemptible Instances☆135Feb 22, 2024Updated 2 years ago
- A toolkit to run Ray applications on Kubernetes☆2,355Updated this week
- Serving multiple LoRA finetuned LLM as one☆1,144May 8, 2024Updated last year
- YiRage (Yield Revolutionary AGile Engine) - Multi-Backend LLM Inference Optimization. Extends Mirage with comprehensive support for CUDA,…☆36Jan 28, 2026Updated last month
- LLM query engine to retrieve augmented responses from json files.☆16Oct 12, 2023Updated 2 years ago
- ☆17Oct 9, 2023Updated 2 years ago
- Bookmarklet to pull and run hugging face GGUF models in Ollama☆17Oct 17, 2024Updated last year
- IBM development fork of https://github.com/huggingface/text-generation-inference☆63Sep 18, 2025Updated 5 months ago
- [OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable☆210Sep 21, 2024Updated last year
- The code for paper "EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning"☆37Oct 1, 2025Updated 5 months ago
- Machine Learning Inference Graph Spec☆21Jul 27, 2019Updated 6 years ago
- The AMD rocAL is designed to efficiently decode and process images and videos from a variety of storage formats and modify them through a…☆23Feb 27, 2026Updated last week
- ☆27Jan 7, 2025Updated last year
- Dynamic Memory Management for Serving LLMs without PagedAttention☆464May 30, 2025Updated 9 months ago
- ☆14Nov 7, 2025Updated 3 months ago
- VSCode extension for working with Architecture As A Code in the C4 model. Includes syntax highlighting, diagram preview, and tools for wo…☆32Feb 25, 2026Updated last week
- Resources regarding evML (edge verified machine learning)☆22Jan 4, 2025Updated last year
- ☆19Feb 25, 2024Updated 2 years ago
- ⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Pl…☆2,175Oct 8, 2024Updated last year
- Compare different hardware platforms via the Roofline Model for LLM inference tasks.☆119Mar 13, 2024Updated last year
- Testing various methods of moving Arrow data between processes☆16Mar 29, 2023Updated 2 years ago
- ☆19Oct 2, 2023Updated 2 years ago
- This repository contains code for the MicroAdam paper.☆21Dec 14, 2024Updated last year
- Training tiny models to prove hard theorems☆41Feb 15, 2026Updated 2 weeks ago