huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆33Updated this week
Alternatives and similar repositories for tgi-gaudi:
Users that are interested in tgi-gaudi are comparing it to the libraries listed below
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆175Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆58Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆61Updated last week
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆120Updated 2 weeks ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆160Updated 2 weeks ago
- ☆235Updated this week
- Benchmark suite for LLMs from Fireworks.ai☆69Updated last month
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆289Updated last month
- IBM development fork of https://github.com/huggingface/text-generation-inference☆60Updated 2 months ago
- ☆54Updated 5 months ago
- A low-latency & high-throughput serving engine for LLMs☆319Updated last month
- Applied AI experiments and examples for PyTorch☆243Updated this week
- ☆34Updated this week
- OpenAI Triton backend for Intel® GPUs☆168Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆13Updated 2 weeks ago
- oneCCL Bindings for Pytorch*☆89Updated last week
- ☆61Updated 2 weeks ago
- Dynamic Memory Management for Serving LLMs without PagedAttention☆308Updated 3 weeks ago
- NVIDIA NCCL Tests for Distributed Training☆82Updated this week
- ☆169Updated this week
- ☆116Updated 11 months ago
- ArcticTraining is a framework designed to simplify and accelerate the post-training process for large language models (LLMs)☆45Updated this week
- ☆48Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMs☆261Updated 5 months ago
- ☆15Updated 2 weeks ago
- Materials for learning SGLang☆328Updated 2 weeks ago
- Fast low-bit matmul kernels in Triton☆257Updated this week
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference☆116Updated last year
- This repo contains documents of the OPEA project☆30Updated this week