AI-Maker-Space / FastAPI-LLM-Model-ServingLinks
How to quickly serve an LLM using Fast API, Celery, and Redis
β16Updated 2 years ago
Alternatives and similar repositories for FastAPI-LLM-Model-Serving
Users that are interested in FastAPI-LLM-Model-Serving are comparing it to the libraries listed below
Sorting:
- Fine-tune an LLM to perform batch inference and online serving.β115Updated 7 months ago
- Find the optimal model serving solution for π€ Hugging Face models πβ45Updated 5 months ago
- [ACL'25] Official Code for LlamaDuo: LLMOps Pipeline for Seamless Migration from Service LLMs to Small-Scale Local LLMsβ314Updated 5 months ago
- GenAIOps on Kubernetes: A collection of reference architectures for running GenAI at scale on Kubernetes using OSS toolingβ135Updated last year
- Sample notebooks and prompts for LLM evaluationβ158Updated 2 months ago
- Self-host LLMs with vLLM and BentoMLβ163Updated last month
- π» Decoding ML articles hub: Hands-on articles with code on production-grade MLβ140Updated 10 months ago
- Multimodal AI workloads: batch inference, model training and online serving.β105Updated 4 months ago
- GenAI Experimentationβ59Updated 4 months ago
- A collection of hand on notebook for LLMs practitionerβ51Updated 11 months ago
- π Use NVIDIA NIMs with Haystack pipelinesβ31Updated last year
- A set of scripts and notebooks on LLM finetunning and dataset creationβ113Updated last year
- A repository for all ZenML projects that are specific production use-cases.β291Updated last month
- A collection of fine-tuning notebooks!β29Updated 2 years ago
- A collection of all available inference solutions for the LLMsβ93Updated 10 months ago
- How far can we go with an LLM for a classification problemβ24Updated last year
- Using LlamaIndex with Ray for productionizing LLM applicationsβ71Updated 2 years ago
- This repository will contain the presentation and python jupyter notebooks for the DataHack Summit 2024 conference talk, Improving Real-wβ¦β121Updated last year
- β89Updated 2 years ago
- Collection of reference workflows for building intelligent agents with NIMsβ183Updated 11 months ago
- A Hands-on Practical Guide to LlamaIndexβ33Updated last year
- Miscellaneous codes and writings for MLOpsβ15Updated last month
- Examples of RAG using Llamaindex with local LLMs - Gemma, Mixtral 8x7B, Llama 2, Mistral 7B, Orca 2, Phi-2, Neural 7Bβ131Updated last year
- Learn the ins and outs of efficiently serving Large Language Models (LLMs). Dive into optimization techniques, including KV caching and Lβ¦β17Updated last year
- GPT2 fine-tuning pipeline with KerasNLP, TensorFlow, and TensorFlow Extendedβ33Updated 2 years ago
- Context-Aware RAG library for Knowledge Graph ingestion and retrieval functions.β49Updated 2 months ago
- Examples of using Evidently to evaluate, test and monitor ML models.β45Updated 3 weeks ago
- β75Updated last year
- Complete implementation of Llama2 with/without KV cache & inference πβ49Updated last year
- Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Daβ119Updated 9 months ago