huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆32Updated last week
Alternatives and similar repositories for tgi-gaudi:
Users that are interested in tgi-gaudi are comparing it to the libraries listed below
- A high-throughput and memory-efficient inference and serving engine for LLMs☆62Updated this week
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆181Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆62Updated 3 weeks ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆290Updated 2 months ago
- Benchmark suite for LLMs from Fireworks.ai☆70Updated last month
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"☆55Updated this week
- Easy and Efficient Quantization for Transformers☆195Updated last month
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆123Updated last week
- ☆176Updated this week
- ☆54Updated 6 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆262Updated 5 months ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆162Updated last month
- ☆63Updated last week
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 3 months ago
- ☆37Updated this week
- OpenAI Triton backend for Intel® GPUs☆172Updated this week
- ☆238Updated last week
- Google TPU optimizations for transformers models☆104Updated 2 months ago
- AI Accelerator Benchmark focuses on evaluating AI Accelerators from a practical production perspective, including the ease of use and ver…☆233Updated last week
- oneCCL Bindings for Pytorch*☆91Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆87Updated this week
- Dynamic Memory Management for Serving LLMs without PagedAttention☆333Updated last week
- Applied AI experiments and examples for PyTorch☆251Updated last week
- ☆16Updated this week
- A low-latency & high-throughput serving engine for LLMs☆330Updated 2 months ago
- Triton CLI is an open source command line interface that enables users to create, deploy, and profile models served by the Triton Inferen…☆61Updated 2 weeks ago
- ☆49Updated 3 weeks ago
- This repo contains documents of the OPEA project☆30Updated this week
- Fast low-bit matmul kernels in Triton☆272Updated last week
- OpenAI compatible API for TensorRT LLM triton backend☆202Updated 8 months ago