A high performance batching router optimises max throughput for text inference workload
β16Sep 6, 2023Updated 2 years ago
Alternatives and similar repositories for text-inference-batcher
Users that are interested in text-inference-batcher are comparing it to the libraries listed below
Sorting:
- π LLM inference optimization simulator, modeling compute-bound prefill and memory-bound decode phases.β13Jul 12, 2025Updated 7 months ago
- Graph model execution API for Candleβ17Jul 27, 2025Updated 7 months ago
- The purpose of this repository only to share various ComfyUI workflows time to timeβ55Nov 21, 2025Updated 3 months ago
- Using modal.com to process FineWeb-edu dataβ20Apr 5, 2025Updated 11 months ago
- εΊδΊ CUDA Driver API η cuda θΏθ‘ζΆη―ε’β15Jul 30, 2025Updated 7 months ago
- Get deterministic output in any format like json from any LLM.β19Apr 25, 2023Updated 2 years ago
- This library supports evaluating disparities in generated image quality, diversity, and consistency between geographic regions.β20Jun 3, 2024Updated last year
- Sparse autoencoders for Contra text embedding modelsβ25Apr 24, 2024Updated last year
- B-Llama3o a llama3 with Vision Audio and Audio understanding as well as text and Audio and Animation Data output.β26Jun 3, 2024Updated last year
- A curated list of awesome papers about utilizing large language models for ranking.β31Oct 30, 2024Updated last year
- β24Feb 2, 2026Updated last month
- PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculationβ32Nov 16, 2024Updated last year
- β14Feb 18, 2026Updated 2 weeks ago
- You can use it to modify HTTP (S) response values, redirect static file requests to the local file directory, and support batch modificatβ¦β18Nov 30, 2022Updated 3 years ago
- User-friendly viewer for Parquet filesβ10Jan 10, 2026Updated last month
- DOS Program Developmentβ13Nov 9, 2022Updated 3 years ago
- Training code for Sparse Autoencoders on Embedding modelsβ39Feb 27, 2025Updated last year
- utilities for loading and running text embeddings with onnxβ45Aug 16, 2025Updated 6 months ago
- A domain-specific language (DSL) based on Triton but providing higher-level abstractions.β41Feb 4, 2026Updated last month
- β10Jan 9, 2024Updated 2 years ago
- a SplineCamera react componentβ14Feb 18, 2024Updated 2 years ago
- LightGBM for handling label-imbalanced data with focal and weighted loss functions in binary and multiclass classificationβ21Jan 29, 2026Updated last month
- Protocol buffers and other common resources.β13Updated this week
- Rapid Response sample Foundry appβ17Updated this week
- ChatGPT CSS styleβ14Apr 28, 2024Updated last year
- This project is based on the [LTX-Video](https://github.com/Lightricks/LTX-Video) algorithm of the diffusers and optimized and accelerateβ¦β13Dec 31, 2024Updated last year
- β11Dec 6, 2023Updated 2 years ago
- Redis distributed lock implementation for Python based on Pub/Sub messagingβ11Feb 14, 2026Updated 2 weeks ago
- A complete(grpc service and lib) Rust inference with multilingual embedding support. This version leverages the power of Rust for both GRβ¦β40Aug 20, 2024Updated last year
- BERT score for text generationβ12Jan 15, 2025Updated last year
- Dockerized Ethereum testnetsβ13Jun 3, 2018Updated 7 years ago
- Array quantization and compressionβ14Dec 8, 2023Updated 2 years ago
- KPI Reporter is a dev-friendly, on-premises tool for crafting automated reports on what matters to you.β10Oct 6, 2022Updated 3 years ago
- [ICML 2025] Efficiently Serving Large Multimodal Models Using EPD Disaggregationβ22May 29, 2025Updated 9 months ago
- Create semver tags and releases. Decide with version to increment.β10Jul 2, 2024Updated last year
- β11Mar 24, 2025Updated 11 months ago
- μ½λ‘λ19 λ°μνν© λ³λ λ° μ 곡μ§μ¬ν νΈμμλ¦Ό μλΉμ€(μ§λ³κ΄λ¦¬λ³ΈλΆ μ½λ‘λ19 ννμ΄μ§ λ°μ΄ν° μ΄μ©)β12Jan 5, 2023Updated 3 years ago
- Yet another frontend for LLM, written using .NET and WinUI 3β10Sep 14, 2025Updated 5 months ago
- Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!β10Aug 29, 2018Updated 7 years ago