huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆31Updated last week
Alternatives and similar repositories for tgi-gaudi:
Users that are interested in tgi-gaudi are comparing it to the libraries listed below
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆171Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMs☆56Updated this week
- Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…☆60Updated 2 months ago
- Reference models for Intel(R) Gaudi(R) AI Accelerator☆159Updated this week
- oneCCL Bindings for Pytorch*☆88Updated last month
- Pretrain, finetune and serve LLMs on Intel platforms with Ray☆112Updated last week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.☆12Updated 2 months ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆286Updated 2 weeks ago
- GenAI components at micro-service level; GenAI service composer to create mega-service☆106Updated this week
- NVIDIA NCCL Tests for Distributed Training☆79Updated 3 weeks ago
- OpenVINO Tokenizers extension☆29Updated this week
- ☆52Updated 5 months ago
- Intel® Tensor Processing Primitives extension for Pytorch*☆10Updated last week
- Tutorials for running models on First-gen Gaudi and Gaudi2 for Training and Inference. The source files for the tutorials on https://dev…☆57Updated last week
- Easy and Efficient Quantization for Transformers☆193Updated 2 weeks ago
- Setup and Installation Instructions for Habana binaries, docker image creation☆25Updated last month
- A tool for bandwidth measurements on NVIDIA GPUs.☆364Updated last week
- Benchmark suite for LLMs from Fireworks.ai☆66Updated last week
- ☆224Updated this week
- ☆34Updated this week
- This repo contains documents of the OPEA project☆29Updated this week
- 🤗 Optimum Intel: Accelerate inference with Intel optimization tools☆443Updated this week
- Evaluation, benchmark, and scorecard, targeting for performance on throughput and latency, accuracy on popular evaluation harness, safety…☆27Updated this week
- ☆159Updated this week
- ☆117Updated 11 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inference☆59Updated 2 months ago
- oneAPI Collective Communications Library (oneCCL)☆222Updated 3 weeks ago
- Development repository for the Triton language and compiler☆107Updated this week
- Materials for learning SGLang☆265Updated 2 weeks ago
- The Triton backend for the ONNX Runtime.☆138Updated this week