bentoml / BentoVLLMLinks
Self-host LLMs with vLLM and BentoML
β120Updated this week
Alternatives and similar repositories for BentoVLLM
Users that are interested in BentoVLLM are comparing it to the libraries listed below
Sorting:
- Tutorial for building LLM routerβ210Updated 11 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β137Updated 10 months ago
- β62Updated 2 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Modelsβ106Updated 2 months ago
- β101Updated 9 months ago
- A Lightweight Library for AI Observabilityβ245Updated 4 months ago
- β61Updated last year
- Fine-tune an LLM to perform batch inference and online serving.β112Updated 3 weeks ago
- Using LlamaIndex with Ray for productionizing LLM applicationsβ71Updated last year
- Vector Database with support for late interaction and token level embeddings.β55Updated 8 months ago
- SGLang is fast serving framework for large language models and vision language models.β23Updated 4 months ago
- Deployment a light and full OpenAI API for production with vLLM to support /v1/embeddings with all embeddings models.β42Updated 11 months ago
- β204Updated last week
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.β67Updated last year
- Own your AI, search the web with itππβ87Updated 5 months ago
- GPT-4 Level Conversational QA Trained In a Few Hoursβ62Updated 10 months ago
- DSPY on action with OpenSource LLMs.β72Updated last year
- Routing on Random Forest (RoRF)β170Updated 9 months ago
- Complete example of how to build an Agentic RAG architecture with Redis, Amazon Bedrock, and LlamaIndex.β92Updated 6 months ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS toolingβ130Updated 7 months ago
- β19Updated 4 months ago
- Evaluation of bm42 sparse indexing algorithmβ68Updated 11 months ago
- A stable, fast and easy-to-use inference library with a focus on a sync-to-async APIβ45Updated 8 months ago
- Solving data for LLMs - Create quality synthetic datasets!β149Updated 5 months ago
- Simple examples using Argilla tools to build AIβ53Updated 7 months ago
- Set of scripts to finetune LLMsβ37Updated last year
- β75Updated 5 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ348Updated this week
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Updated 2 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β70Updated 7 months ago