huggingface / tgi-gaudiLinks
Large Language Model Text Generation Inference on Habana Gaudi
โ34Updated 9 months ago
Alternatives and similar repositories for tgi-gaudi
Users that are interested in tgi-gaudi are comparing it to the libraries listed below
Sorting:
- Easy and lightning fast training of ๐ค Transformers on Habana Gaudi processor (HPU)โ204Updated last week
- A high-throughput and memory-efficient inference and serving engine for LLMsโ86Updated this week
- ๐๏ธ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oโฆโ326Updated 3 months ago
- Intelยฎ Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Noteโฆโ63Updated 6 months ago
- IBM development fork of https://github.com/huggingface/text-generation-inferenceโ62Updated 4 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Rayโ131Updated 3 months ago
- A high-throughput and memory-efficient inference and serving engine for LLMsโ267Updated last month
- Benchmark suite for LLMs from Fireworks.aiโ84Updated last month
- Easy and Efficient Quantization for Transformersโ202Updated 6 months ago
- Reference models for Intel(R) Gaudi(R) AI Acceleratorโ169Updated last week
- ArcticInference: vLLM plugin for high-throughput, low-latency inferenceโ368Updated last week
- โ218Updated 11 months ago
- TPU inference for vLLM, with unified JAX and PyTorch support.โ213Updated this week
- โ274Updated this week
- A safetensors extension to efficiently store sparse quantized tensors on diskโ233Updated last week
- AIPerf is a comprehensive benchmarking tool that measures the performance of generative AI models served by your preferred inference soluโฆโ90Updated this week
- OpenAI compatible API for TensorRT LLM triton backendโ219Updated last year
- Intel Gaudi's Megatron DeepSpeed Large Language Models for trainingโ16Updated last year
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLMโ190Updated last week
- Module, Model, and Tensor Serialization/Deserializationโ283Updated 4 months ago
- โ324Updated this week
- ๐ค Optimum Intel: Accelerate inference with Intel optimization toolsโ528Updated this week
- vLLM adapter for a TGIS-compatible gRPC server.โ47Updated this week
- An innovative library for efficient LLM inference via low-bit quantizationโ351Updated last year
- A tool to configure, launch and manage your machine learning experiments.โ214Updated this week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsโ93Updated this week
- LM engine is a library for pretraining/finetuning LLMsโ110Updated last week
- โ56Updated last year
- Fast low-bit matmul kernels in Tritonโ423Updated last month
- โ131Updated this week