mixedbread-ai / binary-embeddingsLinks
Showcase how mxbai-embed-large-v1 can be used to produce binary embedding. Binary embeddings enabled 32x storage savings and 40x faster retrieval.
☆18Updated last year
Alternatives and similar repositories for binary-embeddings
Users that are interested in binary-embeddings are comparing it to the libraries listed below
Sorting:
- WIP: Ofen is a toolkit aimed at making transformer models production-ready. API included☆15Updated 8 months ago
- Crispy reranking models by Mixedbread☆31Updated last month
- Pre-train Static Word Embeddings☆70Updated this week
- Trully flash implementation of DeBERTa disentangled attention mechanism.☆55Updated 2 weeks ago
- NLP with Rust for Python 🦀🐍☆62Updated 3 weeks ago
- ☆43Updated 3 months ago
- Model implementation for the contextual embeddings project☆26Updated this week
- SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…☆48Updated last year
- ☆62Updated 10 months ago
- Efficient few-shot learning with cross-encoders.☆52Updated last year
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated last year
- Improving Text Embedding of Language Models Using Contrastive Fine-tuning☆64Updated 10 months ago
- Vector Database with support for late interaction and token level embeddings.☆54Updated 8 months ago
- Starbucks: Improved Training for 2D Matryoshka Embeddings☆20Updated 3 months ago
- 🤗 HuggingFace Inference Toolkit for Google Cloud Vertex AI (similar to SageMaker's Inference Toolkit, but for Vertex AI and unofficial)☆17Updated last year
- ☆57Updated 2 weeks ago
- ☆70Updated 5 months ago
- XTR: Rethinking the Role of Token Retrieval in Multi-Vector Retrieval☆51Updated 11 months ago
- Lite weight wrapper for the independent implementation of SPLADE++ models for search & retrieval pipelines. Models and Library created by…☆31Updated 9 months ago
- Lightweight tools for quick and easy LLM demo's☆27Updated 8 months ago
- Efficiently computing & storing token n-grams from large corpora☆23Updated 7 months ago
- The Batched API provides a flexible and efficient way to process multiple requests in a batch, with a primary focus on dynamic batching o…☆136Updated last week
- CLIR version of ColBERT☆67Updated last month
- ☆10Updated last year
- Hugging Face Inference Toolkit used to serve transformers, sentence-transformers, and diffusers models.☆72Updated last week
- Check for data drift between two OpenAI multi-turn chat jsonl files.☆37Updated last year
- Using open source LLMs to build synthetic datasets for direct preference optimization☆63Updated last year
- A fast, local, and secure approach for training LLMs for coding tasks using GRPO with WebAssembly and interpreter feedback.☆24Updated 2 months ago
- Official Repository for "Hypencoder: Hypernetworks for Information Retrieval"☆25Updated 2 months ago
- "Syntriever: How to Train Your Retriever with Synthetic Data from LLMs" the Nations of the Americas Chapter of the Association for Comput…☆25Updated 2 months ago