bentoml / BentoVLLMLinks
Self-host LLMs with vLLM and BentoML
β133Updated last week
Alternatives and similar repositories for BentoVLLM
Users that are interested in BentoVLLM are comparing it to the libraries listed below
Sorting:
- β62Updated 3 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β137Updated 11 months ago
- Tutorial for building LLM routerβ216Updated 11 months ago
- Experimental Code for StructuredRAG: JSON Response Formatting with Large Language Modelsβ108Updated 3 months ago
- A Lightweight Library for AI Observabilityβ246Updated 4 months ago
- A simple service that integrates vLLM with Ray Serve for fast and scalable LLM serving.β68Updated last year
- β101Updated 10 months ago
- Machine Learning Serving focused on GenAI with simplicity as the top priority.β59Updated last week
- Ready-to-go containerized RAG service. Implemented with text-embedding-inference + Qdrant/LanceDB.β67Updated 6 months ago
- Unattended Lightweight Text Classifiers with LLM Embeddingsβ185Updated 10 months ago
- A collection of all available inference solutions for the LLMsβ91Updated 4 months ago
- Fine-tune an LLM to perform batch inference and online serving.β112Updated last month
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ394Updated this week
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS toolingβ130Updated 8 months ago
- Elasticsearch integration into LangChainβ57Updated 5 months ago
- Own your AI, search the web with itππβ86Updated 6 months ago
- Using LlamaIndex with Ray for productionizing LLM applicationsβ71Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.β82Updated this week
- Complete example of how to build an Agentic RAG architecture with Redis, Amazon Bedrock, and LlamaIndex.β95Updated 7 months ago
- Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafteβ¦β73Updated 8 months ago
- β213Updated last week
- β159Updated 2 weeks ago
- β106Updated last week
- GPT-4 Level Conversational QA Trained In a Few Hoursβ62Updated 10 months ago
- Accelerating your LLM training to full speed! Made with β€οΈ by ServiceNow Researchβ211Updated this week
- Simple examples using Argilla tools to build AIβ53Updated 7 months ago
- Solving data for LLMs - Create quality synthetic datasets!β150Updated 5 months ago
- Vector Database with support for late interaction and token level embeddings.β55Updated 3 weeks ago
- Route LLM requests to the best model for the task at hand.β78Updated 2 weeks ago
- Modular, open source LLMOps stack that separates concerns: LiteLLM unifies LLM APIs, manages routing and cost controls, and ensures high-β¦β106Updated 4 months ago