huggingface / text-embeddings-inferenceLinks
A blazing fast inference solution for text embeddings models
☆3,624Updated last week
Alternatives and similar repositories for text-embeddings-inference
Users that are interested in text-embeddings-inference are comparing it to the libraries listed below
Sorting:
- Infinity is a high-throughput, low-latency serving engine for text-embeddings, reranking models, clip, clap and colpali☆2,202Updated 2 weeks ago
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆2,103Updated 2 weeks ago
- MTEB: Massive Text Embedding Benchmark☆2,568Updated this week
- Large Language Model Text Generation Inference☆10,172Updated this week
- LMDeploy is a toolkit for compressing, deploying, and serving LLMs.☆6,455Updated this week
- An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.☆4,857Updated last month
- SGLang is a fast serving framework for large language models and vision language models.☆14,814Updated this week
- Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs☆2,989Updated 2 weeks ago
- Retrieval and Retrieval-augmented LLMs☆9,833Updated last week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,724Updated this week
- Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean…☆11,355Updated last week
- Supercharge Your LLM Application Evaluations 🚀☆9,334Updated last week
- Enforce the output format (JSON Schema, Regex etc) of a language model☆1,818Updated 3 months ago
- LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalabili…☆3,262Updated this week
- AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:☆2,176Updated 3 weeks ago
- Efficient Retrieval Augmentation and Generation Framework☆1,558Updated 4 months ago
- MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.☆2,014Updated 2 months ago
- Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.☆2,396Updated last week
- PyTorch native post-training library☆5,233Updated this week
- Tools for merging pretrained large language models.☆5,774Updated this week
- RayLLM - LLMs on Ray (Archived). Read README for more info.☆1,260Updated 2 months ago
- Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…☆3,473Updated 2 weeks ago
- Accessible large language models via k-bit quantization for PyTorch.☆7,088Updated last week
- This includes the original implementation of SELF-RAG: Learning to Retrieve, Generate and Critique through self-reflection by Akari Asai,…☆2,087Updated last year
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,436Updated last week
- Developer-friendly, embedded retrieval engine for multimodal AI. Search More; Manage Less.☆6,502Updated last week
- Go ahead and axolotl questions☆9,506Updated this week
- LLMPerf is a library for validating and benchmarking LLMs☆922Updated 5 months ago
- S-LoRA: Serving Thousands of Concurrent LoRA Adapters☆1,826Updated last year
- ⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)☆2,353Updated this week