There are many articles that cover the principles of reducing latency optimization for LLMs, however it is often unclear how to actually implement these principles. This repository provides practical techniques for reducing the latency of GenAI applications.
☆37May 6, 2024Updated 2 years ago
Alternatives and similar repositories for The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications
Users that are interested in The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Apr 1, 2025Updated last year
- Generative AI Ops RAG project template☆44Apr 21, 2026Updated last month
- This hands-on walks you through fine-tuning an open source LLM on Azure and serving the fine-tuned model on Azure. It is intended for Dat…☆12Jun 23, 2024Updated last year
- It summerizes the algorithms of Machine Learning.☆12Oct 26, 2025Updated 7 months ago
- State‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: …☆21Apr 16, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- This solution converts speech to text and then processes and summarizes the text based on the prompt scenario.☆20Aug 8, 2024Updated last year
- A Durable Task Python SDK compatible with the Durable Task Scheduler☆34Updated this week
- Solution Accelerator: Using Logic Apps & Form Recognizer☆15Sep 22, 2023Updated 2 years ago
- A collection of Korean NLP hands-on labs on Amazon SageMaker☆19Dec 20, 2023Updated 2 years ago
- Tui Utility to test REST APIs☆13Nov 20, 2023Updated 2 years ago
- Creates an Azure AI Studio hub, project and required dependent resources including Azure Open AI Service, Cognitive Search and more.☆33Oct 2, 2024Updated last year
- Coffee Chat Voice Assistant is a voice-driven ordering system powered by Azure OpenAI GPT-4o Realtime API, simulating the experience of o…☆31May 4, 2026Updated last month
- This lab is a 1-day/2-day end-to-end SLM workshop led and developed by AI GBB. Attendees will learn how to quickly and easily perform the…☆46Jan 22, 2026Updated 4 months ago
- Data from "Crowdsourcing of Parallel Corpora: the Case of Style Transfer for Detoxification" paper☆14Apr 3, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Assistant API to chat with tabular data and perform analytics in natural language.☆57Aug 30, 2024Updated last year
- Unofficial entropix impl for Gemma2 and Llama and Qwen2 and Mistral☆17Jan 12, 2025Updated last year
- Datasets and models included in the book "Introduction to Bayesian Data Analysis for Cognitive Science".☆17Apr 21, 2026Updated last month
- Explore the use of DSPy for extracting features from PDFs 🔎☆52Mar 1, 2024Updated 2 years ago
- Library to convert natural language utterance into a structured domain specific language☆20Feb 11, 2026Updated 4 months ago
- This hands-on lab walks you through a step-by-step approach to efficiently serving and fine-tuning large-scale Korean models on AWS infra…☆26Feb 8, 2024Updated 2 years ago
- ☆15Mar 6, 2024Updated 2 years ago
- ☆125May 12, 2026Updated last month
- AI-Sentry: A lightweight, pluggable facade layer for Azure Open AI, addressing common cross-cutting concerns for enterprise-wide scaling.☆17Aug 4, 2025Updated 10 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This solution converts speech to text and then processes and summarizes the text based on the prompt scenario.☆39Oct 8, 2024Updated last year
- Contoso Outdoors Company web application shown at Microsoft Ignite☆59May 10, 2024Updated 2 years ago
- High-performance open-source orchestration utility that utilizes EBS Direct APIs to efficiently clone, copy and migrate EBS snapshots to …☆39Dec 11, 2024Updated last year
- R package that helps to render interlinear glossed linguistic examples in html rmarkdown documents and then semi-automatically compiles t…☆17Nov 18, 2025Updated 6 months ago
- Azure OpenAI benchmarking tool☆152May 28, 2024Updated 2 years ago
- GPT-5 and Opus 4.1 implementations of one-shot coding examples☆18Jun 4, 2026Updated last week
- Performs benchmarking on two Korean datasets with minimal time and effort.☆45Jan 22, 2026Updated 4 months ago
- Open source repository to help others learn about IaC and the various flavors☆18Apr 16, 2024Updated 2 years ago
- ☆30Apr 8, 2022Updated 4 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Datasets and models included in the book "Linear Mixed Models for Linguistics and Psychology: A Comprehensive Introduction".☆18Oct 22, 2025Updated 7 months ago
- My talks!☆15Mar 11, 2026Updated 3 months ago
- A beamer template mainly for Japanese.☆14Apr 21, 2024Updated 2 years ago
- React CodeGen using GPT☆12Feb 11, 2024Updated 2 years ago
- MOD-DOCKER is an open-source MOD DUO emulator for Linux based on Docker that lets you play around with hundreds of LV2 audio plugins!☆11Aug 10, 2023Updated 2 years ago
- Workshop on reproducible research practices for psychologists☆20Apr 30, 2022Updated 4 years ago
- Tilt apiserver based on kubernetes/apiserver☆20May 6, 2026Updated last month