huseinzol05/transformers-continuous-batching

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huseinzol05/transformers-continuous-batching)

huseinzol05 / transformers-continuous-batching

Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.

☆29

Alternatives and similar repositories for transformers-continuous-batching

Users that are interested in transformers-continuous-batching are comparing it to the libraries listed below

Sorting:

Electrofried / Astarte
View on GitHub
☆11Feb 20, 2025Updated last year
FarFetchd / sleepyllama
View on GitHub
an auto-sleeping and -waking framework around llama.cpp
☆12Feb 8, 2025Updated last year
gitkaz / mlx_gguf_server
View on GitHub
This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously using multiprocessing.
☆16Updated this week
extopico / llama-server_mcp_proxy
View on GitHub
Simple node proxy for llama-server that enables MCP use
☆17May 10, 2025Updated 9 months ago
lynthera / bitsegments_localminds
View on GitHub
Offline LLM chatbot with personalized memory — works on CPU with multi-session memory support.
☆22Jan 10, 2026Updated last month
camenduru / FluxMusic-jupyter
View on GitHub
☆19Sep 4, 2024Updated last year
charmandercha / ArchiDoc
View on GitHub
☆17Dec 16, 2024Updated last year
wchisasa / rabbit
View on GitHub
An fully autonomous agent that accesses the browser and performs tasks.
☆17Apr 25, 2025Updated 10 months ago
kstonekuan / digital-twin-proxy
View on GitHub
A forward proxy to turn network traffic into personal memory for AI agents
☆36Feb 23, 2026Updated last week
hellangleZ / Qwen3_autothink_adapter
View on GitHub
Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…
☆22May 9, 2025Updated 9 months ago
AaronFeng753 / Better-R1
View on GitHub
A open webui function for better R1 experience
☆78Mar 7, 2025Updated 11 months ago
runparse / agent-script
View on GitHub
A simple, observable code-writing agent builder in TypeScript.
☆30Apr 9, 2025Updated 10 months ago
mitja / llamatunnel
View on GitHub
Publish local LLMs and LLM apps on the internet.
☆27Aug 17, 2025Updated 6 months ago
bmen25124 / SillyTavern-MCP-Client
View on GitHub
An extension of MCP for SillyTavern.
☆74Jul 26, 2025Updated 7 months ago
rodrigobaron / anthill
View on GitHub
☆24Jan 22, 2025Updated last year
akashjss / orpheus-tts-local-webui
View on GitHub
Run Orpheus 3B Locally with Gradio UI, Standalone App
☆23Apr 1, 2025Updated 11 months ago
cp3249 / splaa
View on GitHub
SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…
☆29May 6, 2025Updated 9 months ago
k-koehler / gguf-tensor-overrider
View on GitHub
☆53Oct 10, 2025Updated 4 months ago
stringandstickytape / MaxsAiStudio
View on GitHub
A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.
☆35Feb 11, 2026Updated 3 weeks ago
ColeMurray / moondream-mcp
View on GitHub
Moondream MCP Server in Python
☆44Jul 2, 2025Updated 8 months ago
moaljumaa / halfwayml_open
View on GitHub
Open source tool for transcirption and subtitling, alternative to happyscribe.
☆33Feb 12, 2025Updated last year
PkmX / orpheus-chat-webui
View on GitHub
Orpheus Chat WebUI
☆76Mar 27, 2025Updated 11 months ago
Praful932 / llmsearch
View on GitHub
Find better generation parameters for your LLM
☆27Jun 9, 2024Updated last year
grctest / FastAPI-BitNet
View on GitHub
Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.
☆36Jul 2, 2025Updated 8 months ago
sammcj / moa
View on GitHub
Mixture-of-Ollamas
☆30Aug 12, 2024Updated last year
shinomakoi / AI-Messenger
View on GitHub
A QT GUI for large language models
☆39Dec 27, 2023Updated 2 years ago
SicariusSicariiStuff / SLOP_Detector
View on GitHub
SLOP Detector and analyzer based on dictionary for shareGPT JSON and text
☆82Feb 7, 2026Updated 3 weeks ago
Write-with-LAIKA / drama-engine
View on GitHub
A Framework for Narrative Agents
☆37Sep 24, 2024Updated last year
smy20011 / MorningRadio
View on GitHub
Generate Your Own Private Morning Radio for Commute
☆32Feb 5, 2025Updated last year
homelab-00 / TranscriptionSuite
View on GitHub
A fully local & private Speech-To-Text app for Linux, Windows & macOS. Python backend + Electron frontend, utilizing faster-whisper and C…
☆125Updated this week
KoljaB / LocalEmotionalAIVoiceChat
View on GitHub
Simulates talk with an AI that can express emotions
☆83Jun 17, 2025Updated 8 months ago
AlgorithmicKing737 / orpheus-tts-local-openai
View on GitHub
Run Orpheus 3B Locally With LM Studio
☆32Mar 20, 2025Updated 11 months ago
pwilkin / llama-runner
View on GitHub
Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backends
☆52Aug 21, 2025Updated 6 months ago
julianthomas04 / Nova2
View on GitHub
An AI assistant building SDK in python
☆43Sep 21, 2025Updated 5 months ago
Aditya239233 / GNNExplainer
View on GitHub
Code for running experiments and benchmarking on GNNExplainer: Generating Explanations for Graph Neural Networks
☆15May 8, 2021Updated 4 years ago
thang-nm / Google-Translate.popclipext
View on GitHub
📋 Instant Google Translate for PopClip app
☆10Jul 29, 2022Updated 3 years ago
hwcloud-RAS / SmartHW
View on GitHub
☆11May 16, 2025Updated 9 months ago
fbarberis / tiktok-tts
View on GitHub
Text to audio with Tik-Tok Voices
☆13Apr 6, 2023Updated 2 years ago
IWantBe / MDPS
View on GitHub
☆12Sep 19, 2022Updated 3 years ago