lapp0 / lm-inference-enginesView external linksLinks
Comparison of Language Model Inference Engines
☆240Dec 16, 2024Updated last year
Alternatives and similar repositories for lm-inference-engines
Users that are interested in lm-inference-engines are comparing it to the libraries listed below
Sorting:
- ☆51May 31, 2024Updated last year
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- Large-scale LLM inference engine☆1,651Jan 21, 2026Updated 3 weeks ago
- A collection of reproducible inference engine benchmarks☆38Apr 22, 2025Updated 9 months ago
- The official API server for Exllama. OAI compatible, lightweight, and fast.☆1,129Updated this week
- A fast inference library for running LLMs locally on modern consumer-class GPUs☆4,445Dec 9, 2025Updated 2 months ago
- A tool that can be used to measure the sequential performance of any OpenAI-compatible LLM API☆22Aug 1, 2024Updated last year
- Efficient Finetuning for OpenAI GPT-OSS☆23Oct 2, 2025Updated 4 months ago
- ☆18Aug 19, 2024Updated last year
- Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).☆251Mar 15, 2024Updated last year
- Proxy server for triton gRPC server that inferences embedding model in Rust☆21Aug 10, 2024Updated last year
- ☆56Nov 18, 2024Updated last year
- JacQues is a Dash-based interactive web application that facilitates real-time chat and document management.☆22Jan 5, 2026Updated last month
- Vocabulary Parallelism☆25Mar 10, 2025Updated 11 months ago
- ☆329Updated this week
- 1-Click is all you need.☆63Apr 29, 2024Updated last year
- Generate interleaved text and image content in a structured format you can directly pass to downstream APIs.☆29Oct 18, 2024Updated last year
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆7,606Updated this week
- Cleanai (https://github.com/willmil11/cleanai) except I'm making it in c now. Fast and clean from the start this time :)☆17Feb 5, 2026Updated last week
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆1,011Sep 4, 2024Updated last year
- 如需体验textin文档解析,请点击https://cc.co/16YSIy☆22Jul 9, 2024Updated last year
- Benchmark structured generation libraries☆30Oct 25, 2024Updated last year
- Easy and Efficient Quantization for Transformers☆205Jan 28, 2026Updated 2 weeks ago
- ☆63Jul 10, 2025Updated 7 months ago
- Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees" adapted for Llama models☆41Aug 4, 2023Updated 2 years ago
- A high-performance inference system for large language models, designed for production environments.☆492Dec 19, 2025Updated last month
- ☆26Feb 11, 2025Updated last year
- Stable Diffusion and Flux in pure C/C++☆24Feb 7, 2026Updated last week
- Ko-Arena-Hard-Auto: An automatic LLM benchmark for Korean☆22Apr 23, 2025Updated 9 months ago
- llama3.cuda is a pure C/CUDA implementation for Llama 3 model.☆350Apr 27, 2025Updated 9 months ago
- BERT score for text generation☆12Jan 15, 2025Updated last year
- 面向大模型的民族文化数据集☆12May 26, 2025Updated 8 months ago
- Yet another frontend for LLM, written using .NET and WinUI 3☆10Sep 14, 2025Updated 5 months ago
- ☆12May 30, 2025Updated 8 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆13Mar 30, 2024Updated last year
- This project demonstrates the use of generic bi-directional LSTM models for predicting importance of words in a spoken dialgoue for under…☆10Mar 24, 2023Updated 2 years ago
- Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM☆2,737Updated this week
- Tools for merging pretrained large language models.☆6,783Jan 26, 2026Updated 2 weeks ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year