huggingface / tgi-gaudiLinks
Large Language Model Text Generation Inference on Habana Gaudi
โ34Updated 6 months ago
Alternatives and similar repositories for tgi-gaudi
Users that are interested in tgi-gaudi are comparing it to the libraries listed below
Sorting:
- Easy and lightning fast training of ๐ค Transformers on Habana Gaudi processor (HPU)โ198Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsโ83Updated last week
- Intelยฎ Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Noteโฆโ63Updated 3 months ago
- Reference models for Intel(R) Gaudi(R) AI Acceleratorโ165Updated 2 weeks ago
- ๐๏ธ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of Oโฆโ315Updated 2 weeks ago
- Benchmark suite for LLMs from Fireworks.aiโ83Updated this week
- A high-throughput and memory-efficient inference and serving engine for LLMsโ265Updated 11 months ago
- Easy and Efficient Quantization for Transformersโ203Updated 3 months ago
- Pretrain, finetune and serve LLMs on Intel platforms with Rayโ132Updated 2 weeks ago
- IBM development fork of https://github.com/huggingface/text-generation-inferenceโ61Updated 3 weeks ago
- โ255Updated last week
- Google TPU optimizations for transformers modelsโ120Updated 8 months ago
- ArcticInference: vLLM plugin for high-throughput, low-latency inferenceโ270Updated this week
- โ218Updated 8 months ago
- โ72Updated 6 months ago
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMsโ90Updated this week
- JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs welโฆโ379Updated 3 months ago
- A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLMโ53Updated this week
- A tool to configure, launch and manage your machine learning experiments.โ195Updated last week
- ๐ Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.โ213Updated last week
- โ56Updated 10 months ago
- A safetensors extension to efficiently store sparse quantized tensors on diskโ167Updated this week
- PyTorch/XLA integration with JetStream (https://github.com/google/JetStream) for LLM inference"โ74Updated 3 weeks ago
- Module, Model, and Tensor Serialization/Deserializationโ267Updated last month
- Explore training for quantized modelsโ24Updated 2 months ago
- An innovative library for efficient LLM inference via low-bit quantizationโ350Updated last year
- โ58Updated last year
- โ298Updated last week
- Fast low-bit matmul kernels in Tritonโ376Updated last week
- Boosting 4-bit inference kernels with 2:4 Sparsityโ82Updated last year