Lightweight continuous batching OpenAI compatibility using HuggingFace Transformers include T5 and Whisper.
☆29Mar 15, 2025Updated 11 months ago
Alternatives and similar repositories for transformers-continuous-batching
Users that are interested in transformers-continuous-batching are comparing it to the libraries listed below
Sorting:
- ☆11Feb 20, 2025Updated last year
- an auto-sleeping and -waking framework around llama.cpp☆12Feb 8, 2025Updated last year
- This is a FastAPI based LLM server. Load multiple LLM models (MLX or llama.cpp) simultaneously using multiprocessing.☆16Updated this week
- Simple node proxy for llama-server that enables MCP use☆17May 10, 2025Updated 9 months ago
- Offline LLM chatbot with personalized memory — works on CPU with multi-session memory support.☆22Jan 10, 2026Updated last month
- ☆19Sep 4, 2024Updated last year
- ☆17Dec 16, 2024Updated last year
- An fully autonomous agent that accesses the browser and performs tasks.☆17Apr 25, 2025Updated 10 months ago
- A forward proxy to turn network traffic into personal memory for AI agents☆36Feb 23, 2026Updated last week
- Implemented a script that automatically adjusts Qwen3's inference and non-inference capabilities, based on an OpenAI-like API. The infere…☆22May 9, 2025Updated 9 months ago
- A open webui function for better R1 experience☆78Mar 7, 2025Updated 11 months ago
- A simple, observable code-writing agent builder in TypeScript.☆30Apr 9, 2025Updated 10 months ago
- Publish local LLMs and LLM apps on the internet.☆27Aug 17, 2025Updated 6 months ago
- An extension of MCP for SillyTavern.☆74Jul 26, 2025Updated 7 months ago
- ☆24Jan 22, 2025Updated last year
- Run Orpheus 3B Locally with Gradio UI, Standalone App☆23Apr 1, 2025Updated 11 months ago
- SPLAA is an AI assistant framework that utilizes voice recognition, text-to-speech, and tool-calling capabilities to provide a conversati…☆29May 6, 2025Updated 9 months ago
- ☆53Oct 10, 2025Updated 4 months ago
- A Windows tool to query various LLM AIs. Supports branched conversations, history and summaries among others.☆35Feb 11, 2026Updated 3 weeks ago
- Moondream MCP Server in Python☆44Jul 2, 2025Updated 8 months ago
- Open source tool for transcirption and subtitling, alternative to happyscribe.☆33Feb 12, 2025Updated last year
- Orpheus Chat WebUI☆76Mar 27, 2025Updated 11 months ago
- Find better generation parameters for your LLM☆27Jun 9, 2024Updated last year
- Running Microsoft's BitNet inference framework via FastAPI, Uvicorn and Docker.☆36Jul 2, 2025Updated 8 months ago
- Mixture-of-Ollamas☆30Aug 12, 2024Updated last year
- A QT GUI for large language models☆39Dec 27, 2023Updated 2 years ago
- SLOP Detector and analyzer based on dictionary for shareGPT JSON and text☆82Feb 7, 2026Updated 3 weeks ago
- A Framework for Narrative Agents☆37Sep 24, 2024Updated last year
- Generate Your Own Private Morning Radio for Commute☆32Feb 5, 2025Updated last year
- A fully local & private Speech-To-Text app for Linux, Windows & macOS. Python backend + Electron frontend, utilizing faster-whisper and C…☆125Updated this week
- Simulates talk with an AI that can express emotions☆83Jun 17, 2025Updated 8 months ago
- Run Orpheus 3B Locally With LM Studio☆32Mar 20, 2025Updated 11 months ago
- Llama.cpp runner/swapper and proxy that emulates LMStudio / Ollama backends☆52Aug 21, 2025Updated 6 months ago
- An AI assistant building SDK in python☆43Sep 21, 2025Updated 5 months ago
- Code for running experiments and benchmarking on GNNExplainer: Generating Explanations for Graph Neural Networks☆15May 8, 2021Updated 4 years ago
- 📋 Instant Google Translate for PopClip app☆10Jul 29, 2022Updated 3 years ago
- ☆11May 16, 2025Updated 9 months ago
- Text to audio with Tik-Tok Voices☆13Apr 6, 2023Updated 2 years ago
- ☆12Sep 19, 2022Updated 3 years ago