Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.
☆93Apr 15, 2026Updated 2 weeks ago
Alternatives and similar repositories for huggingface-inference-toolkit
Users that are interested in huggingface-inference-toolkit are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆22Apr 17, 2026Updated 2 weeks ago
- AI Energy Score: Initiative to establish comparable energy efficiency ratings for AI models.☆39Dec 2, 2025Updated 5 months ago
- YASEM - Yet Another Splade|Sparse Embedder - A simple and efficient library for SPLADE embeddings☆13May 22, 2025Updated 11 months ago
- 🤝 Trade any tensors over the network☆31Sep 27, 2023Updated 2 years ago
- Large Language Model Text Generation Inference on Habana Gaudi☆34Mar 20, 2025Updated last year
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- FlexiTokens☆21Dec 27, 2025Updated 4 months ago
- Github action to connect to tailscale☆20Apr 21, 2026Updated 2 weeks ago
- 🤗 Collection of examples on how to train, deploy and monitor HuggingFace models in Google Cloud Vertex AI☆22Feb 26, 2024Updated 2 years ago
- Automatically derive Python dunder methods for your Rust code☆25Apr 7, 2026Updated 3 weeks ago
- A framework for few-shot evaluation of language models.☆36Apr 3, 2026Updated last month
- ☆29Nov 18, 2025Updated 5 months ago
- 8-bit floating point types for Rust☆64Feb 4, 2026Updated 3 months ago
- Manage scalable open LLM inference endpoints in Slurm clusters☆287Jul 11, 2024Updated last year
- Pixel Parsing. A reproduction of OCR-free end-to-end document understanding models with open data☆23Jul 30, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆65Feb 6, 2025Updated last year
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆160Jul 14, 2025Updated 9 months ago
- A Rust crate offering similar functionality to the Python transformers package using Candle.☆14Nov 19, 2024Updated last year
- Draw a ui and make it real☆17Dec 2, 2023Updated 2 years ago
- ☆16Jul 23, 2024Updated last year
- ☆124Apr 17, 2026Updated 2 weeks ago
- Python library to use Pleias-RAG models☆71May 1, 2025Updated last year
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 7 months ago
- Baguetter is a flexible, efficient, and hackable search engine library implemented in Python. It's designed for quickly benchmarking, imp…☆210Aug 31, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆14Dec 21, 2025Updated 4 months ago
- [NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding☆22Oct 10, 2024Updated last year
- Command Line Interface for Hugging Face Inference Endpoints☆65Apr 10, 2024Updated 2 years ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,396Apr 17, 2026Updated 2 weeks ago
- ☆18Sep 5, 2024Updated last year
- Train LLM on Hugging Face infra☆72Apr 2, 2026Updated last month
- Speech to Speech conversation using the OpenAI RealTime API in Python 🐍☆26Nov 18, 2024Updated last year
- 🌏 Modular retrievers for zero-shot multilingual IR.☆30Mar 6, 2024Updated 2 years ago
- MCP server for Liveblocks.☆15Feb 14, 2026Updated 2 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)☆209Updated this week
- ☆26Dec 13, 2024Updated last year
- The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained mode…☆92Apr 3, 2026Updated last month
- ☆16Sep 4, 2025Updated 8 months ago
- SEO Technical Standards Draft☆13Sep 26, 2024Updated last year
- Implementation of the dilated self attention as described in "LongNet: Scaling Transformers to 1,000,000,000 Tokens"☆13Jul 23, 2023Updated 2 years ago
- Small python package to measure OCR quality and other related metrics.☆27Feb 19, 2024Updated 2 years ago