Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆106Aug 14, 2024Updated last year
Alternatives and similar repositories for batch-inference
Users that are interested in batch-inference are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Predict the performance of LLM inference services☆23Sep 18, 2025Updated 7 months ago
- Python Inference Script(PyIS)☆19Aug 30, 2022Updated 3 years ago
- A proof of concept library for generating and running machine learning model tests☆13Sep 27, 2020Updated 5 years ago
- ☆23Mar 24, 2023Updated 3 years ago
- pytorch版bert权重转tf☆22May 19, 2020Updated 5 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.☆75Mar 8, 2024Updated 2 years ago
- Project Moab software stack☆24May 23, 2023Updated 2 years ago
- LLM Serving Performance Evaluation Harness☆85Feb 25, 2025Updated last year
- ☆18Jan 2, 2024Updated 2 years ago
- Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion☆20Jul 9, 2019Updated 6 years ago
- Simple and easy stable diffusion inference with LightningModule on GPU, CPU and MPS (Possibly all devices supported by Lightning).☆17Jul 27, 2023Updated 2 years ago
- Official code for Guiding Language Model Math Reasoning with Planning Tokens☆19Feb 29, 2024Updated 2 years ago
- A time delay estimation method for event-based time-series data. Time delay estimation is also known as the correction of time offsets an…☆15Dec 3, 2025Updated 4 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,108Jun 30, 2025Updated 10 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Simple Dynamic Batching Inference☆144Mar 8, 2022Updated 4 years ago
- ☆32Jan 30, 2023Updated 3 years ago
- superfast text to speech in any voice☆62Feb 16, 2026Updated 2 months ago
- Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)☆28Jun 28, 2023Updated 2 years ago
- golang vad (voice activity detection) library based on webrtc☆12Dec 13, 2021Updated 4 years ago
- The helm chart for setting up SearXNG with kubernetes.☆44Mar 10, 2025Updated last year
- StrategyQA 데이터 세트 번역☆22Apr 12, 2024Updated 2 years ago
- Office addins development☆14May 11, 2016Updated 9 years ago
- ☆22Dec 3, 2021Updated 4 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Entity Linking within a Social Media Platform☆11May 2, 2019Updated 6 years ago
- Summary of system papers/frameworks/codes/tools on training or serving large model☆57Dec 17, 2023Updated 2 years ago
- ☆18Oct 31, 2022Updated 3 years ago
- PHO-LID: A Unified Model to Incorporate Acoustic-Phonetic and Phonotactic Information for Language Identification☆21Aug 24, 2023Updated 2 years ago
- High-performance vector search engine with no loss of accuracy through GPU and dynamic placement☆32Jul 12, 2025Updated 9 months ago
- ☆21May 13, 2022Updated 3 years ago
- ☆125Mar 17, 2024Updated 2 years ago
- Chain-of-thought 방식을 활용하여 llama2를 fine-tuning☆10Nov 18, 2023Updated 2 years ago
- Navigation Turing Test (NTT): Learning to Evaluate Human-Like Navigation [ICML 2021]☆15Jul 17, 2025Updated 9 months ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Imitation learning from multiple experts☆13Aug 29, 2022Updated 3 years ago
- experiments with inference on llama☆103Jun 6, 2024Updated last year
- 基于鼠标键盘操作的微信自动聊天机器人☆13Nov 26, 2024Updated last year
- PyTorch implementation of WaveFit [2022, Google] which is one of SOTA lightweight/fast speech vocoders.☆65Sep 8, 2025Updated 7 months ago
- The official implementation of the method discussed in the paper Improving Spoken Language Identification with Map-Mix(work accepted at I…☆18Feb 17, 2023Updated 3 years ago
- Hands-on Artificial Intelligence with TensorFlow, published by Packt☆11Feb 16, 2021Updated 5 years ago
- This repository showcases how to implement trunk-based development workflow while working in a Machine Learning project.☆41Sep 16, 2022Updated 3 years ago