ialacol / text-inference-batcherLinks

A high performance batching router optimises max throughput for text inference workload

☆16

Alternatives and similar repositories for text-inference-batcher

Users that are interested in text-inference-batcher are comparing it to the libraries listed below

Sorting:

enjalot / latent-data-modal
Using modal.com to process FineWeb-edu data
☆20Updated 3 months ago
aigeek0x0 / radiantloom-email-assist-7b
Radiantloom Email Assist 7B is an email-assistant large language model fine-tuned from Zephyr-7B-Beta, over a custom-curated dataset of 1…
☆14Updated last year
teknium1 / ShareGPT-Builder
☆115Updated 7 months ago
QuixiAI / kraken
☆66Updated last year
chimezie / mlx-tuning-fork
Very basic framework for composable parameterized large language model (Q)LoRA / (Q)Dora fine-tuning using mlx, mlx_lm, and OgbujiPT.
☆42Updated last month
multiplexerai / mplx_rag
Complex RAG backend
☆29Updated last year
yoheinakajima / autofinetune
auto fine tune of models with synthetic data
☆76Updated last year
matthewrenze / jhu-concise-cot
The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models
☆22Updated 8 months ago
catena-labs / moa-llm
A Python library to orchestrate LLMs in a neural network-inspired structure
☆49Updated 9 months ago
nicholasyager / llama-cpp-guidance
A guidance compatibility layer for llama-cpp-python
☆35Updated last year
discus-labs / discus
A data-centric AI package for ML/AI. Get the best high-quality data for the best results. Discord: https://discord.gg/t6ADqBKrdZ
☆63Updated last year
log10-io / log10
Python client library for improving your LLM app accuracy
☆98Updated 5 months ago
michaelfeil / embed
A stable, fast and easy-to-use inference library with a focus on a sync-to-async API
☆45Updated 10 months ago
itsPreto / VECTR8
Embed anything.
☆28Updated last year
BBischof / yapping
Verbosity control for AI agents
☆64Updated last year
swyxio / openlangmem
☆47Updated last year
Xalp / ECHO
Official homepage for "Self-Harmonized Chain of Thought" (NAACL 2025)
☆91Updated 6 months ago
shivamsanju / ragswift
🚀 Scale your RAG pipeline using Ragswift: A scalable centralized embeddings management platform
☆38Updated last year
spyglass-search / talos
Easily create LLM automation/agent workflows
☆59Updated last year
lightblue-tech / lb-reranker
☆23Updated 6 months ago
adrienbrault / hf-gguf-to-ollama
Dagger functions to import Hugging Face GGUF models into a local ollama instance and optionally push them to ollama.com.
☆116Updated last year
jmanhype / dspy-self-discover-framework
Leveraging DSPy for AI-driven task understanding and solution generation, the Self-Discover Framework automates problem-solving through r…
☆66Updated last year
monk1337 / auto-ollama
run ollama & gguf easily with a single command
☆52Updated last year
mzbac / mlx-moe
Scripts to create your own moe models using mlx
☆90Updated last year
SLAM-group / newhope
☆22Updated 2 years ago
l4b4r4b4b4 / AIDocks
LLM-Training-API: Including Embeddings & ReRankers, mergekit, LaserRMT
☆27Updated last year
flowaicom / flow-judge
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆76Updated 9 months ago
mithril-security / blind_llama_client
Zero-trust AI APIs for easy and private consumption of open-source LLMs
☆40Updated last year
VatsaDev / NanoPhi-alpha
GPT-2 small trained on phi-like data
☆67Updated last year
thooton / muse
Let's create synthetic textbooks together :)
☆75Updated last year