bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆79Updated this week
Alternatives and similar repositories for BentoVLLM:
Users that are interested in BentoVLLM are comparing it to the libraries listed below
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆183Updated last month
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆97Updated last month
- ☆49Updated 2 weeks ago
- Dynamic Metadata based RAG Framework☆71Updated 5 months ago
- Tutorial for building LLM router☆170Updated 5 months ago
- DSPY on action with OpenSource LLMs.☆63Updated 9 months ago
- ☆76Updated 7 months ago
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing ⚡☆62Updated 2 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆55Updated 3 weeks ago
- ☆18Updated 3 months ago
- A collection of all available inference solutions for the LLMs☆74Updated 4 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated last week
- End-to-End LLM Guide☆99Updated 6 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆138Updated 5 months ago
- Using LlamaIndex with Ray for productionizing LLM applications☆71Updated last year
- Complete example of how to build an Agentic RAG architecture with Redis, AWS Bedrock, and LlamaIndex.☆84Updated last month
- ☆109Updated this week
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆56Updated 2 months ago
- ☆96Updated 4 months ago
- ☆150Updated this week
- Routing on Random Forest (RoRF)☆98Updated 3 months ago
- A list of AI memory projects☆66Updated last week
- Web App for generating synthetic data☆46Updated 4 months ago
- Own your AI, search the web with it🌐😎☆74Updated this week
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆121Updated this week
- ☆137Updated 5 months ago
- Beating the GAIA benchmark with Transformers Agents. 🚀☆77Updated 2 months ago
- Vector Database with support for late interaction and token level embeddings.☆51Updated 3 months ago
- ☆69Updated this week
- GPT-4 Level Conversational QA Trained In a Few Hours☆58Updated 4 months ago