IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆59Updated 2 months ago
Alternatives and similar repositories for text-generation-inference:
Users that are interested in text-generation-inference are comparing it to the libraries listed below
- 🚀 Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.☆34Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆66Updated last week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Data☆30Updated this week
- Python library for Synthetic Data Generation☆32Updated this week
- Dolomite Engine is a library for pretraining/finetuning LLMs☆36Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆259Updated 4 months ago
- ☆159Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.☆21Updated this week
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs☆191Updated this week
- The code for the paper ROUTERBENCH: A Benchmark for Multi-LLM Routing System☆104Updated 8 months ago
- ☆53Updated last month
- Train, tune, and infer Bamba model☆84Updated last month
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuning☆34Updated this week
- Easy and Efficient Quantization for Transformers☆193Updated 2 weeks ago
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.☆80Updated 2 weeks ago
- codebase release for EMNLP2023 paper publication☆19Updated 11 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆88Updated this week
- ☆65Updated 8 months ago
- experiments with inference on llama☆104Updated 8 months ago
- ☆251Updated 2 months ago
- Data preparation code for Amber 7B LLM☆85Updated 9 months ago
- ☆52Updated 5 months ago
- The driver for LMCache core to run in vLLM☆29Updated 2 weeks ago
- Self-host LLMs with vLLM and BentoML☆87Updated this week
- ☆199Updated last year
- Lightweight demos for finetuning LLMs. Powered by 🤗 transformers and open-source datasets.☆67Updated 4 months ago
- Pre-training code for CrystalCoder 7B LLM☆55Updated 9 months ago
- ReLM is a Regular Expression engine for Language Models☆103Updated last year
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆107Updated 2 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 5 months ago