bentoml / BentoVLLM
Self-host LLMs with vLLM and BentoML
☆100Updated this week
Alternatives and similar repositories for BentoVLLM:
Users that are interested in BentoVLLM are comparing it to the libraries listed below
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆248Updated this week
- Tutorial for building LLM router☆193Updated 8 months ago
- ☆57Updated 2 weeks ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…☆66Updated 5 months ago
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆136Updated 8 months ago
- Fine-tune an LLM to perform batch inference and online serving.☆107Updated this week
- A Lightweight Library for AI Observability☆239Updated last month
- ☆99Updated 7 months ago
- This project showcases an LLMOps pipeline that fine-tunes a small-size LLM model to prepare for the outage of the service LLM.☆303Updated last week
- ☆60Updated last year
- Using LlamaIndex with Ray for productionizing LLM applications☆71Updated last year
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Models☆105Updated this week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.☆58Updated this week
- Run AI generated code in isolated sandboxes☆50Updated 2 months ago
- ☆66Updated 10 months ago
- A flexible, adaptive classification system for dynamic text classification☆150Updated last month
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.☆65Updated last year
- An OpenAI Completions API compatible server for NLP transformers models☆65Updated last year
- Vector Database with support for late interaction and token level embeddings.☆54Updated 6 months ago
- GPT-4 Level Conversational QA Trained In a Few Hours☆59Updated 7 months ago
- ☆80Updated 2 months ago
- Efficient vector database for hundred millions of embeddings.☆205Updated 10 months ago
- Routing on Random Forest (RoRF)☆139Updated 6 months ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS tooling☆129Updated 5 months ago
- Client Code Examples, Use Cases and Benchmarks for Enterprise h2oGPTe RAG-Based GenAI Platform☆86Updated last month
- DSPY on action with OpenSource LLMs.☆70Updated last year
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.☆63Updated 3 months ago
- experiments with inference on llama☆104Updated 10 months ago
- Evaluation of bm42 sparse indexing algorithm☆65Updated 9 months ago
- A prompting library☆158Updated 6 months ago