There are many articles that cover the principles of reducing latency optimization for LLMs, however it is often unclear how to actually implement these principles. This repository provides practical techniques for reducing the latency of GenAI applications.
☆36May 6, 2024Updated last year
Alternatives and similar repositories for The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications
Users that are interested in The-LLM-Latency-Guidebook-Optimizing-Response-Times-for-GenAI-Applications are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆14Apr 1, 2025Updated last year
- Generative AI Ops RAG project template☆43Apr 21, 2026Updated last week
- It summerizes the algorithms of Machine Learning.☆12Oct 26, 2025Updated 6 months ago
- State‑of‑the‑art speech recognition model for English, delivering transcription accuracy across diverse audio scenarios. <metadata> gpu: …☆20Apr 16, 2025Updated last year
- This solution converts speech to text and then processes and summarizes the text based on the prompt scenario.☆20Aug 8, 2024Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Solution Accelerator: Using Logic Apps & Form Recognizer☆15Sep 22, 2023Updated 2 years ago
- A collection of Korean NLP hands-on labs on Amazon SageMaker☆19Dec 20, 2023Updated 2 years ago
- An R package to help assess the sensitivity of a Bayesian model (fitted with Stan) to the specification of its likelihood and priors☆11Apr 8, 2025Updated last year
- Coffee Chat Voice Assistant is a voice-driven ordering system powered by Azure OpenAI GPT-4o Realtime API, simulating the experience of o…☆31Mar 12, 2025Updated last year
- This lab is a 1-day/2-day end-to-end SLM workshop led and developed by AI GBB. Attendees will learn how to quickly and easily perform the…☆46Jan 22, 2026Updated 3 months ago
- Assistant API to chat with tabular data and perform analytics in natural language.☆56Aug 30, 2024Updated last year
- Unofficial entropix impl for Gemma2 and Llama and Qwen2 and Mistral☆17Jan 12, 2025Updated last year
- ☆10Dec 27, 2024Updated last year
- Azure OpenAI benchmarking tool☆29Apr 4, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Datasets and models included in the book "Introduction to Bayesian Data Analysis for Cognitive Science".☆17Apr 21, 2026Updated last week
- Explore the use of DSPy for extracting features from PDFs 🔎☆52Mar 1, 2024Updated 2 years ago
- Convert any image into a Region Adjacency Graph (RAG)☆12Apr 27, 2020Updated 6 years ago
- Library to convert natural language utterance into a structured domain specific language☆19Feb 11, 2026Updated 2 months ago
- ☆15Oct 18, 2024Updated last year
- A series of templates for a Quarto project to convert a single Markdown input into beautiful and simple Word, HTML, and PDF worksheets.☆14Feb 12, 2022Updated 4 years ago
- 🚀 Embark on your agentic journey !☆29May 28, 2025Updated 11 months ago
- ☆15Mar 6, 2024Updated 2 years ago
- ☆125Feb 4, 2026Updated 2 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆30Feb 14, 2025Updated last year
- Interlinear glosses for pandoc☆10Feb 12, 2018Updated 8 years ago
- XeLaTeX で和文する実験☆14Nov 6, 2022Updated 3 years ago
- AI-Sentry: A lightweight, pluggable facade layer for Azure Open AI, addressing common cross-cutting concerns for enterprise-wide scaling.☆17Aug 4, 2025Updated 8 months ago
- Repo for "An empirically-driven guide on using Bayes Factors for M/EEG decoding"☆13Nov 2, 2022Updated 3 years ago
- This solution converts speech to text and then processes and summarizes the text based on the prompt scenario.☆39Oct 8, 2024Updated last year
- Contoso Outdoors Company web application shown at Microsoft Ignite☆58May 10, 2024Updated last year
- ☆14May 23, 2019Updated 6 years ago
- High-performance open-source orchestration utility that utilizes EBS Direct APIs to efficiently clone, copy and migrate EBS snapshots to …☆39Dec 11, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆15Aug 30, 2021Updated 4 years ago
- R package that helps to render interlinear glossed linguistic examples in html rmarkdown documents and then semi-automatically compiles t…☆17Nov 18, 2025Updated 5 months ago
- Tiny configuration for Triton Inference Server☆45Jan 10, 2025Updated last year
- GPT-5 and Opus 4.1 implementations of one-shot coding examples☆18Updated this week
- Annotated Fuman Kaitori Center Corpus☆18Dec 18, 2023Updated 2 years ago
- Open source repository to help others learn about IaC and the various flavors☆18Apr 16, 2024Updated 2 years ago
- ☆30Apr 8, 2022Updated 4 years ago