huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
โ33Updated last month
Alternatives and similar repositories for tgi-gaudi
Users that are interested in tgi-gaudi are comparing it to the libraries listed below
Sorting:
- Easy and lightning fast training of ๐ค Transformers on Habana Gaudi processor (HPU)โ186Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsโ70Updated this week
- Pretrain, finetune and serve LLMs on Intel platforms with Rayโ126Updated 2 weeks ago
- Reference models for Intel(R) Gaudi(R) AI Acceleratorโ162Updated 2 weeks ago
- Intelยฎ Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Noteโฆโ61Updated 2 months ago
- ๐๏ธ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oโฆโ300Updated this week
- Easy and Efficient Quantization for Transformersโ197Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsโ263Updated 7 months ago
- Benchmark suite for LLMs from Fireworks.aiโ72Updated this week
- DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.โ13Updated 2 months ago
- โ119Updated last year
- โ53Updated 7 months ago
- โ45Updated this week
- GenAI components at micro-service level; GenAI service composer to create mega-serviceโ145Updated this week
- A low-latency & high-throughput serving engine for LLMsโ354Updated 3 weeks ago
- A general 2-8 bits quantization toolbox with GPTQ/AWQ/HQQ/VPTQ, and export to onnx/onnx-runtime easily.โ169Updated last month
- Dynamic Memory Management for Serving LLMs without PagedAttentionโ366Updated 3 weeks ago
- oneCCL Bindings for Pytorch*โ97Updated 2 weeks ago
- โ68Updated last month
- This repo contains documents of the OPEA projectโ38Updated this week
- OpenAI Triton backend for Intelยฎ GPUsโ184Updated this week
- โ255Updated this week
- NVIDIA NCCL Tests for Distributed Trainingโ90Updated this week
- โ49Updated 2 months ago
- OpenVINO Tokenizers extensionโ33Updated this week
- MSCCL++: A GPU-driven communication stack for scalable AI applicationsโ349Updated this week
- Module, Model, and Tensor Serialization/Deserializationโ227Updated this week
- Perplexity GPU Kernelsโ281Updated last week
- KV cache compression for high-throughput LLM inferenceโ126Updated 3 months ago
- QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inferenceโ117Updated last year