mixedbread-ai / binary-embeddings
Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster retrieval.
β15Updated last year
Alternatives and similar repositories for binary-embeddings:
Users that are interested in binary-embeddings are comparing it to the libraries listed below
- WIP: Ofen is a toolkit aimed at making transformer models production-ready. API includedβ14Updated 5 months ago
- NLP with Rust for Python π¦πβ61Updated 9 months ago
- Using open source LLMs to build synthetic datasets for direct preference optimizationβ59Updated last year
- Pre-train Static Word Embeddingsβ49Updated 2 weeks ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.β34Updated 3 months ago
- Trace LLM calls (and others) and visualize them in WandB, as interactive SVG or using a streaming local webappβ14Updated last month
- β63Updated 3 months ago
- DSPy program/pipeline inspector widget for Jupyter/VSCode Notebooks.β33Updated last year
- Efficient few-shot learning with cross-encoders.β49Updated last year
- A library for simplifying fine tuning with multi gpu setups in the Huggingface ecosystem.β16Updated 4 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked promptsβ24Updated last year
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created byβ¦β29Updated 7 months ago
- Using short models to classify long textsβ21Updated 2 years ago
- Plug-and-play Search Interfaces with Pyserini and Hugging Faceβ31Updated last year
- utilities for loading and running text embeddings with onnxβ44Updated 7 months ago
- This is the repo for the container that holds the models for the text2vec-transformers moduleβ49Updated last month
- Supervised instruction finetuning for LLM with HF trainer and Deepspeedβ34Updated last year
- Starbucks: Improved Training for 2D Matryoshka Embeddingsβ18Updated last month
- Lightweight tools for quick and easy LLM demo'sβ26Updated 6 months ago
- Library for fast text representation and classification.β28Updated last year
- π€ HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)β17Updated last year
- Writing Blog Posts with Generative Feedback Loops!β47Updated last year
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 laβ¦β47Updated last year
- Check for data drift between two OpenAI multi-turn chat jsonl files.