AI-Maker-Space / FastAPI-LLM-Model-ServingLinks
How to quickly serve an LLM using Fast API, Celery, and Redis
☆16Updated 2 years ago
Alternatives and similar repositories for FastAPI-LLM-Model-Serving
Users that are interested in FastAPI-LLM-Model-Serving are comparing it to the libraries listed below
Sorting:
- Fine-tune an LLM to perform batch inference and online serving.☆120Updated 8 months ago
- 💻 Decoding ML articles hub: Hands-on articles with code on production-grade ML☆141Updated 11 months ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS tooling☆135Updated last year
- Find the optimal model serving solution for 🤗 Hugging Face models 🚀☆45Updated 6 months ago
- A repository for all ZenML projects that are specific production use-cases.☆302Updated 2 months ago
- 🚀 Use NVIDIA NIMs with Haystack pipelines☆32Updated last year
- Using LlamaIndex with Ray for productionizing LLM applications☆71Updated 2 years ago
- Self-host LLMs with vLLM and BentoML☆168Updated 2 weeks ago
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMs☆314Updated 6 months ago
- Miscellaneous codes and writings for MLOps☆15Updated 3 weeks ago
- Build Enterprise RAG (Retriver Augmented Generation) Pipelines to tackle various Generative AI use cases with LLM's by simply plugging co…☆117Updated last year
- Multimodal AI workloads: batch inference, model training and online serving.☆106Updated 5 months ago
- Mistral + Haystack: build RAG pipelines that rock 🤘☆106Updated 2 years ago
- GenAI Experimentation☆59Updated 5 months ago
- Structured pruning and bias visualization for Large Language Models. Tools for LLM optimization and fairness analysis.☆27Updated this week
- Sample notebooks and prompts for LLM evaluation☆159Updated 3 months ago
- A collection of fine-tuning notebooks!☆30Updated 2 years ago
- ☆56Updated last year
- Examples of RAG using Llamaindex with local LLMs - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7B☆130Updated last year
- ☆75Updated last year
- This repository will contain the presentation and python jupyter notebooks for the DataHack Summit 2024 conference talk, Improving Real-w…☆121Updated last year
- Complete implementation of Llama2 with/without KV cache & inference 🚀☆49Updated last year
- An NVIDIA AI Workbench example project for fine-tuning a Mistral 7B model☆69Updated last year
- A collection of hand on notebook for LLMs practitioner☆51Updated last year
- ☆184Updated last week
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da☆119Updated 10 months ago
- Large Language Model (LLM) Inference API and Chatbot☆128Updated last year
- Various projects using Large Language Model (GPT & LLAMA) other open source model from HuggingFace and OpenAI. OpenAI API required for ru…☆112Updated last month
- Context-Aware RAG library for Knowledge Graph ingestion and retrieval functions.☆54Updated last week
- Building your first LLM application with OpenAI, and AI-assisted Development, step-by-step!☆123Updated 2 months ago