Azure / The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-ApplicationsLinks
There are many articles that cover the principles of reducing latency optimization for LLMs, however it is often unclear how to actually implement these principles. This repository provides practical techniques for reducing the latency of GenAI applications.
β31Updated last year
Alternatives and similar repositories for The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications
Users that are interested in The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications are comparing it to the libraries listed below
Sorting:
- An easy way to deploy the Langfuse observability platform to Azure Container Apps with Entra authentication.β57Updated 2 months ago
- This sample shows how to quickly get started with LlamaIndex.ai on Azureπβ59Updated 2 months ago
- Interactive workflows for creating AI intelligence reports from real-world data sourcesβ88Updated this week
- β51Updated 5 months ago
- An end-to-end sample of RAG showcasing development, evaluation, experimentation, and deployment using Promptflow, search products like Coβ¦β54Updated last year
- An index of all of our weekly concepts + code events for aspiring AI Engineers and Business Leaders!!β86Updated last week
- β28Updated last year
- Building your first LLM application with OpenAI, and AI-assisted Development, step-by-step!β112Updated this week
- Building LLM-Enabled Multi Agent Applications from Scratchβ191Updated this week
- Example for Deploying Chatbot using Streamlit and Azure Web Appβ52Updated 2 years ago
- A recipe that will walk you through using either Meta Llama 3.1 405B or OpenAI GPT-4o deployed on Azure AI to generate a synthetic dataseβ¦β74Updated 3 months ago
- Virtual focus group with custom personas, product details, and final analysis created with AutoGen, Ollama/Llama3, and Streamlit.β47Updated last year
- β35Updated 5 months ago
- β29Updated last year
- Indexing framework designed for the automated creation of structured knowledge bases in Azure AI Searchβ14Updated 4 months ago
- This solution converts speech to text and then processes and summarizes the text based on the prompt scenario.β36Updated last year
- β13Updated 5 months ago
- Adding NeMo Guardrails to a LlamaIndex RAG pipelineβ41Updated last year
- Hugging Face Deep Learning Containers (DLCs) for Google Cloudβ153Updated 5 months ago
- This repository contains a toy implementation of a basic RAQA system.β20Updated last year
- Using LlamaIndex with Ray for productionizing LLM applicationsβ71Updated 2 years ago
- This repo is the central repo for all the RAG Evaluation reference material and partner workshopβ76Updated 5 months ago
- β95Updated last year
- Some python code samples using Azure AI Search for Generative AI stuffβ66Updated 8 months ago
- A collection of examples and tutorials for Qdrant vector search engineβ193Updated this week
- π A list of Haystack Integrations, maintained by the community or deepset.β98Updated this week
- β24Updated 10 months ago
- A multimodal Retrieval Augmented Generation with code execution capabilities. Process multiple complex documents with images, table, charβ¦β72Updated last week
- A backend for a chat application written in Python FastAPI frameworkβ64Updated 3 weeks ago
- β55Updated 3 months ago