huggingface / optimum-habanaLinks

Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)

☆201

Alternatives and similar repositories for optimum-habana

Users that are interested in optimum-habana are comparing it to the libraries listed below

Sorting:

huggingface / optimum-benchmark
🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…
☆320Updated 2 months ago
huggingface / tgi-gaudi
Large Language Model Text Generation Inference on Habana Gaudi
☆34Updated 8 months ago
huggingface / optimum-intel
🤗 Optimum Intel: Accelerate inference with Intel optimization tools
☆513Updated last week
HabanaAI / Model-References
Reference models for Intel(R) Gaudi(R) AI Accelerator
☆169Updated 2 months ago
NVIDIA-NeMo / Run
A tool to configure, launch and manage your machine learning experiments.
☆208Updated last week
neuralmagic / nm-vllm
A high-throughput and memory-efficient inference and serving engine for LLMs
☆267Updated last year
NetEase-FuXi / EETQ
Easy and Efficient Quantization for Transformers
☆203Updated 5 months ago
microsoft / batch-inference
Dynamic batching library for Deep Learning inference. Tutorials for LLM, GPT scenarios.
☆106Updated last year
NVIDIA / NeMo-Framework-Launcher
Provides end-to-end model development pipelines for LLMs and Multimodal models that can be launched on-prem or cloud-native.
☆509Updated 7 months ago
huggingface / optimum-graphcore
Blazing fast training of 🤗 Transformers on Graphcore IPUs
☆85Updated last year
foundation-model-stack / fms-fsdp
🚀 Efficiently (pre)training foundation models with native PyTorch features, including FSDP for training and SDPA implementation of Flash…
☆271Updated last week
HabanaAI / vllm-fork
A high-throughput and memory-efficient inference and serving engine for LLMs
☆85Updated last week
fpgaminer / GPTQ-triton
GPTQ inference Triton kernel
☆315Updated 2 years ago
foundation-model-stack / foundation-model-stack
🚀 Collection of components for development, training, tuning, and inference of foundation models leveraging PyTorch native components.
☆217Updated last week
apple / ml-recurrent-drafter
☆219Updated 10 months ago
huggingface / optimum-tpu
Google TPU optimizations for transformers models
☆123Updated 10 months ago
fw-ai / benchmark
Benchmark suite for LLMs from Fireworks.ai
☆84Updated last week
triton-inference-server / vllm_backend
☆317Updated last week
intel / neural-speed
An innovative library for efficient LLM inference via low-bit quantization
☆350Updated last year
anyscale / llm-continuous-batching-benchmarks
☆122Updated last year
intel / intel-extension-for-deepspeed
Intel® Extension for DeepSpeed* is an extension to DeepSpeed that brings feature support with SYCL kernels on Intel GPU(XPU) device. Note…
☆63Updated 5 months ago
triton-inference-server / pytorch_backend
The Triton backend for the PyTorch TorchScript models.
☆166Updated last week
triton-inference-server / onnxruntime_backend
The Triton backend for the ONNX Runtime.
☆168Updated this week
meta-pytorch / applied-ai
Applied AI experiments and examples for PyTorch
☆308Updated 3 months ago
pytorch / PiPPy
Pipeline Parallelism for PyTorch
☆784Updated last year
coreweave / tensorizer
Module, Model, and Tensor Serialization/Deserialization
☆277Updated 3 months ago
meta-pytorch / float8_experimental
This repository contains the experimental PyTorch native float8 training UX
☆226Updated last year
dropbox / gemlite
Fast low-bit matmul kernels in Triton
☆401Updated 2 weeks ago
Cornell-RelaxML / QuIP
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
☆390Updated last year
meta-pytorch / torchsnapshot
A performant, memory-efficient checkpointing library for PyTorch applications, designed with large, complex distributed workloads in mind…
☆161Updated 2 months ago