huggingface / tgi-gaudiLinks

Large Language Model Text Generation Inference on Habana Gaudi

☆34

Alternatives and similar repositories for tgi-gaudi

Users that are interested in tgi-gaudi are comparing it to the libraries listed below

Sorting:

huggingface / optimum-habana
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
☆201Updated last week
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆85Updated this week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated 2 weeks ago
run-ai / runai-model-streamer
☆268Updated last week
intel / llm-on-ray
Pretrain, finetune and serve LLMs on Intel platforms with Ray
☆130Updated 2 months ago
snowflakedb / ArcticInference
ArcticInference: vLLM plugin for high-throughput, low-latency inference
☆327Updated last week
vllm-project / speculators
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
☆140Updated this week
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆63Updated 5 months ago
IBM / text-generation-inference
IBM development fork of https://github.com/huggingface/text-generation-inference
☆62Updated 2 months ago
HabanaAI / Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
☆169Updated 2 months ago
AI-Hypercomputer / JetStream
JetStream is a throughput and memory optimized engine for LLM inference on XLA devices, starting with TPUs (and GPUs in future -- PRs wel…
☆392Updated 5 months ago
triton-inference-server / vllm_backend
☆319Updated 2 weeks ago
run-ai / llmperf
☆58Updated last year
HabanaAI / Megatron-DeepSpeed
Intel Gaudi's Megatron DeepSpeed Large Language Models for training
☆15Updated 11 months ago
vllm-project / tpu-inference
TPU inference for vLLM, with unified JAX and PyTorch support.
☆178Updated this week
apple / ml-recurrent-drafter
☆219Updated 10 months ago
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
opendatahub-io / vllm-tgis-adapter
vLLM adapter for a TGIS-compatible gRPC server.
☆45Updated this week
vllm-project / compressed-tensors
A safetensors extension to efficiently store sparse quantized tensors on disk
☆214Updated this week
bentoml / llm-bench
☆56Updated last year
triton-inference-server / perf_analyzer
☆123Updated 3 weeks ago
vllm-project / dashboard
vLLM performance dashboard
☆38Updated last year
EmbeddedLLM / vllm
vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs
☆93Updated this week
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆515Updated this week
SqueezeBits / QUICK
QUICK: Quantization-aware Interleaving and Conflict-free Kernel for efficient LLM inference
☆118Updated last year
sgl-project / genai-bench
Genai-bench is a powerful benchmark tool designed for comprehensive token-level performance evaluation of large language model (LLM) serv…
☆236Updated last week
NVIDIA / NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
☆509Updated 7 months ago