IBM / text-generation-inferenceLinks
IBM development fork of https://github.com/huggingface/text-generation-inference
☆62Updated 2 months ago
Alternatives and similar repositories for text-generation-inference
Users that are interested in text-generation-inference are comparing it to the libraries listed below
Sorting:
- Benchmark suite for LLMs from Fireworks.ai☆84Updated last week
- vLLM adapter for a TGIS-compatible gRPC server.☆45Updated this week
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆52Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆267Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM☆132Updated last week
- A collection of all available inference solutions for the LLMs☆93Updated 9 months ago
- ☆267Updated last week
- 🕹️ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.☆139Updated last year
- LM engine is a library for pretraining/finetuning LLMs☆77Updated this week
- ☆198Updated last year
- ☆67Updated 5 months ago
- experiments with inference on llama☆103Updated last year
- Train, tune, and infer Bamba model☆136Updated 6 months ago
- Google TPU optimizations for transformers models☆123Updated 10 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inference☆327Updated this week
- Easy and Efficient Quantization for Transformers☆203Updated 5 months ago
- Inference server benchmarking tool☆130Updated 2 months ago
- Data preparation code for Amber 7B LLM☆93Updated last year
- ☆42Updated last week
- Accelerating your LLM training to full speed! Made with ❤️ by ServiceNow Research☆262Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆93Updated this week
- A collection of reproducible inference engine benchmarks☆38Updated 7 months ago
- ☆39Updated 3 years ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated last year
- Self-host LLMs with vLLM and BentoML☆161Updated last week
- 🦄 Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data …☆212Updated 2 weeks ago
- 👷 Build compute kernels☆190Updated this week
- vLLM performance dashboard☆38Updated last year
- ☆16Updated last week
- ☆64Updated 8 months ago