bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
β94Updated this week
Alternatives and similar repositories for BentoVLLM:
Users that are interested in BentoVLLM are comparing it to the libraries listed below
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ224Updated this week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β58Updated 2 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 7 months ago
- End-to-End LLM Guideβ104Updated 8 months ago
- Routing on Random Forest (RoRF)β135Updated 5 months ago
- β54Updated 2 months ago
- Tutorial for building LLM routerβ187Updated 8 months ago
- DSPY on action with OpenSource LLMs.β68Updated 11 months ago
- Using LlamaIndex with Ray for productionizing LLM applicationsβ71Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Modelsβ104Updated 3 months ago
- β65Updated 9 months ago
- β18Updated 5 months ago
- β99Updated 6 months ago
- β173Updated last week
- High level library for batched embeddings generation, blazingly-fast web-based RAG and quantized indexes processing β‘β67Updated 4 months ago
- β76Updated 9 months ago
- Complete example of how to build an Agentic RAG architecture with Redis, Amazon Bedrock, and LlamaIndex.β91Updated 3 months ago
- Solving data for LLMs - Create quality synthetic datasets!β145Updated 2 months ago
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.β61Updated 2 months ago
- This project enhances the construction of RAG applications by addressing challenges, improving accessibility, scalability, and managing dβ¦β142Updated 11 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.β65Updated 11 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 5 months ago
- Embed anything.β29Updated 9 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β64Updated 4 months ago
- GPT-4 Level Conversational QA Trained In a Few Hoursβ59Updated 7 months ago
- Simple examples using Argilla tools to build AIβ53Updated 4 months ago
- β60Updated 11 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platformβ83Updated last week
- A Lightweight Library for AI Observabilityβ237Updated last month