cohere-ai/DiskVectorIndex

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cohere-ai/DiskVectorIndex)

cohere-ai / DiskVectorIndex

☆209

Alternatives and similar repositories for DiskVectorIndex

Users that are interested in DiskVectorIndex are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

labteral / fastc
View on GitHub
Unattended Lightweight Text Classifiers with LLM Embeddings
☆187Sep 6, 2024Updated last year
stephantul / pynife
View on GitHub
Nearly Inference Free Embeddings: make your RAG queries 500x faster
☆80Apr 27, 2026Updated 2 months ago
davidberenstein1957 / dataset-viber
View on GitHub
Dataset Viber is your chill repo for data collection, annotation and vibe checks.
☆47Sep 5, 2024Updated last year
flowaicom / flow-judge
View on GitHub
Code for evaluating with Flow-Judge-v0.1 - an open-source, lightweight (3.8B) language model optimized for LLM system evaluations. Crafte…
☆86Oct 29, 2024Updated last year
ielab / Starbucks
View on GitHub
Starbucks: Improved Training for 2D Matryoshka Embeddings
☆25Jun 30, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
flairNLP / fundus
View on GitHub
A very simple news crawler with a funny name
☆467Updated this week
cohere-ai / BinaryVectorDB
View on GitHub
Efficient vector database for hundred millions of embeddings.
☆215May 17, 2024Updated 2 years ago
sail-sg / sailcraft
View on GitHub
🚢 Data Toolkit for Sailor Language Models
☆94Feb 24, 2025Updated last year
kanpuriyanawab / minbpe.c
View on GitHub
a Minimal, clean code for the Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization in pure C.
☆24Jul 6, 2024Updated 2 years ago
AnswerDotAI / rerankers
View on GitHub
A lightweight, low-dependency, unified API to use all common reranking and cross-encoder models.
☆1,624Dec 20, 2025Updated 7 months ago
xhluca / bm25s
View on GitHub
Fast BM25 search in Python, powered by Numpy and Numba
☆1,740Jul 7, 2026Updated 2 weeks ago
AnswerDotAI / RAGatouille
View on GitHub
Easily use and train state of the art late-interaction retrieval methods (ColBERT) in any RAG pipeline. Designed for modularity and ease-…
☆3,939May 17, 2025Updated last year
huggingface / setfit
View on GitHub
Efficient few-shot learning with Sentence Transformers
☆2,772May 26, 2026Updated last month
lightonai / pylate
View on GitHub
Late Interaction Models Training & Retrieval
☆875Jul 13, 2026Updated last week
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
MidiyaZhu / MePO
View on GitHub
Code for Rethinking Prompt Optimizers: From Prompt Merits to Optimization
☆13Jan 12, 2026Updated 6 months ago
Knowledgator / GLiClass
View on GitHub
Generalist and Lightweight Model for Text Classification
☆233Updated this week
IBM / fastfit
View on GitHub
FastFit ⚡ When LLMs are Unfit Use FastFit ⚡ Fast and Effective Text Classification with Many Classes
☆220Sep 18, 2025Updated 10 months ago
cisnlp / multypo
View on GitHub
A Multilingual Keyboard Layout-Based Typo Generator
☆17Nov 23, 2025Updated 7 months ago
MantisAI / sieves
View on GitHub
Plug-and-play document AI with zero-shot models.
☆126May 11, 2026Updated 2 months ago
PrithivirajDamodaran / Route0x
View on GitHub
Low latency, High Accuracy, Custom Query routers for Humans and Agents. Built by Prithivi Da
☆122Mar 31, 2025Updated last year
vagos / llm-interpolate
View on GitHub
Interpolate between embedding points with llm
☆38Jul 17, 2024Updated 2 years ago
TREC-RAG / trec-rag.github.io
View on GitHub
Website for TREC RAG
☆14Updated this week
brendanhogan / completion_tree_view
View on GitHub
☆15Apr 26, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,334Jul 13, 2026Updated last week
rmartinshort / text_chunking
View on GitHub
Exploration of semantic chunking and chunk classification
☆19Sep 16, 2024Updated last year
microsoft / MS-MARCO-Web-Search
View on GitHub
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
☆351Dec 16, 2024Updated last year
PrithivirajDamodaran / FlashRank
View on GitHub
Lite & Super-fast re-ranking for your search & retrieval pipelines. Supports SoTA Listwise and Pairwise reranking based on LLMs and cro…
☆994Jul 11, 2026Updated last week
Technoculture / personal-graph
View on GitHub
Simple Graph Memory for AI applications
☆104Feb 23, 2026Updated 4 months ago
Vaibhavs10 / optimise-my-whisper
View on GitHub
☆207May 27, 2024Updated 2 years ago
naver / splade
View on GitHub
SPLADE: sparse neural search (SIGIR21, SIGIR22)
☆999May 3, 2024Updated 2 years ago
tomaarsen / SpanMarkerNER
View on GitHub
SpanMarker for Named Entity Recognition
☆477Apr 10, 2026Updated 3 months ago
v-prgmr / mergekit
View on GitHub
Tools for merging pretrained large language models.
☆19Jun 12, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
microsoft / MInference
View on GitHub
[NeurIPS'24 Spotlight, ICLR'25, ICML'25] To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention…
☆1,221Apr 8, 2026Updated 3 months ago
google-research-datasets / swim-ir
View on GitHub
SWIM-IR is a Synthetic Wikipedia-based Multilingual Information Retrieval training set with 28 million query-passage pairs spanning 33 la…
☆50Nov 13, 2023Updated 2 years ago
shmulvad / zero-for-ner
View on GitHub
Zero-Shot Learning in Named Entity Recognition with Common Sense Knowledge
☆17Nov 16, 2021Updated 4 years ago
microsoft / Interactive-Summarization
View on GitHub
The official repo of our research work "Interactive Editing for Text Summarization".
☆23Jun 3, 2023Updated 3 years ago
jmtomczak / vae_kan_example
View on GitHub
A simple example of VAEs with KANs
☆12May 17, 2024Updated 2 years ago
qdrant / fastembed
View on GitHub
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
☆3,094Updated this week
MinishLab / semhash
View on GitHub
Fast Multimodal Semantic Deduplication & Filtering
☆946May 24, 2026Updated last month