IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
β60Updated 4 months ago
Alternatives and similar repositories for text-generation-inference:
Users that are interested in text-generation-inference are comparing it to the libraries listed below
- vLLM adapter for a TGIS-compatible gRPC server.β27Updated this week
- π Collection of tuning recipes with HuggingFace SFTTrainer and PyTorch FSDP.β41Updated this week
- Benchmark suite for LLMs from Fireworks.aiβ70Updated 2 months ago
- Dolomite Engine is a library for pretraining/finetuning LLMsβ52Updated this week
- β207Updated this week
- Python library for Synthetic Data Generationβ42Updated this week
- Inference server benchmarking toolβ56Updated last week
- InstructLab Training Library - Efficient Fine-Tuning with Message-Format Dataβ38Updated this week
- β59Updated last month
- Large Language Model Text Generation Inference on Habana Gaudiβ33Updated last month
- experiments with inference on llamaβ104Updated 11 months ago
- β50Updated 5 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsβ263Updated 6 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)β86Updated this week
- Train, tune, and infer Bamba modelβ115Updated last week
- β53Updated 11 months ago
- Google TPU optimizations for transformers modelsβ109Updated 3 months ago
- β45Updated last week
- SGLang is fast serving framework for large language models and vision language models.β22Updated 2 months ago
- Efficient and Scalable Estimation of Tool Representations in Vector Spaceβ23Updated 8 months ago
- πΉοΈ Performance Comparison of MLOps Engines, Frameworks, and Languages on Mainstream AI Models.β136Updated 9 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.β36Updated last year
- A toolkit for fine-tuning, inferencing, and evaluating GreenBitAI's LLMs.β83Updated last month
- π¦ Unitxt is a Python library for enterprise-grade evaluation of AI performance, offering the world's largest catalog of tools and data β¦β191Updated this week
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decodingβ116Updated 5 months ago
- β66Updated 11 months ago
- Evaluate and Enhance Your LLM Deployments for Real-World Inference Needsβ284Updated this week
- Docker image NVIDIA GH200 machines - optimized for vllm serving and hf trainer finetuningβ40Updated 2 months ago
- β15Updated last month
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data generaβ¦β29Updated this week