MinishLab / model2vec
Fast State-of-the-Art Static Embeddings
☆1,109Updated 3 weeks ago
Alternatives and similar repositories for model2vec:
Users that are interested in model2vec are comparing it to the libraries listed below
- Fast Semantic Text Deduplication☆582Updated 3 weeks ago
- Use late-interaction multi-modal models such as ColPali in just a few lines of code.☆753Updated last month
- A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.☆1,338Updated last month
- Fast lexical search implementing BM25 in Python using Numpy, Numba and Scipy☆1,063Updated last week
- Everything about the SmolLM2 and SmolVLM family of models☆2,035Updated last week
- Things you can do with the token embeddings of an LLM☆1,433Updated last month
- Bringing BERT into modernity via both architecture changes and scaling☆1,283Updated this week
- Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…☆2,568Updated this week
- Build datasets using natural language☆434Updated 2 weeks ago
- Synthetic data curation for post-training and structured data extraction☆1,049Updated this week
- Recipes for shrinking, optimizing, customizing cutting edge vision models. 💜☆1,316Updated last week
- Colivara is a suite of services that allows you to store, search, and retrieve documents based on their visual embedding. ColiVara has st…☆859Updated last month
- A system for agentic LLM-powered data processing and ETL☆1,718Updated this week
- Fast, Accurate, Lightweight Python library to make State of the Art Embedding☆1,882Updated this week
- 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library☆2,818Updated this week
- LOTUS: A semantic query engine for fast and easy LLM-powered data processing☆1,127Updated this week
- Lightweight Nearest Neighbors with Flexible Backends☆260Updated 3 weeks ago
- The code used to train and run inference with the ColVision models, e.g. ColPali, ColQwen2, and ColSmol.☆1,614Updated this week
- Optimizing inference proxy for LLMs☆2,110Updated this week
- TextGrad: Automatic ''Differentiation'' via Text -- using large language models to backpropagate textual gradients.☆2,271Updated this week
- ☆694Updated this week
- 📚 Process PDFs, Word documents and more with spaCy☆480Updated 2 weeks ago
- Recipes for learning, fine-tuning, and adapting ColPali to your multimodal RAG use cases. 👨🏻🍳☆261Updated 3 months ago
- Generalist and Lightweight Model for Named Entity Recognition (Extract any entity types from texts) @ NAACL 2024☆1,876Updated this week
- Structured information extraction from documents☆312Updated 5 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆1,313Updated this week
- In-Context Learning for eXtreme Multi-Label Classification (XMC) using only a handful of examples.☆414Updated last year
- Implementing the 4 agentic patterns from scratch☆1,119Updated this week
- 🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with PostgreSQL or SQLite☆868Updated this week