ytgui / PilotANNLinks
Memory-Bounded GPU Acceleration for Vector Search
☆31Updated last month
Alternatives and similar repositories for PilotANN
Users that are interested in PilotANN are comparing it to the libraries listed below
Sorting:
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated 2 years ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆28Updated last year
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆66Updated last month
- Compression for Foundation Models☆34Updated 4 months ago
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆83Updated 10 months ago
- [VLDB 25] Maximum Inner Product is Query-Scaled Nearest Neighbor☆34Updated last month
- Bamboo-7B Large Language Model☆93Updated last year
- Collection of datasets for benchmarking filtered vector similarity retrieval☆55Updated 6 months ago
- Official code for "Binary embedding based retrieval at Tencent"☆44Updated last year
- Graph Library for Approximate Similarity Search☆135Updated 2 months ago
- ☆61Updated 6 months ago
- A fast header-only graph-based index for approximate nearest neighbor search (ANNS). https://flatnav.net☆39Updated 5 months ago
- ☆46Updated 7 months ago
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆104Updated 2 months ago
- ⚡ Faster similarity search with PDX: A vertical data layout for vectors☆63Updated 3 months ago
- A library of algorithms for approximate nearest neighbor search in high dimensions, along with a set of useful tools for designing such a…☆173Updated 2 months ago
- PiKV: KV Cache Management System for Mixture of Experts [Efficient ML System]☆44Updated last month
- ☆194Updated this week
- ☆13Updated 10 months ago
- [ICLR2025] Breaking Throughput-Latency Trade-off for Long Sequences with Speculative Decoding☆133Updated last year
- MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrat…☆101Updated last year
- ☆27Updated 7 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆173Updated 7 months ago
- Modular and structured prompt caching for low-latency LLM inference☆103Updated last year
- Large Scale Search Index☆31Updated 2 years ago
- ☆39Updated last year
- GGNN: State of the Art Graph-based GPU Nearest Neighbor Search☆166Updated 9 months ago
- Beyond KV Caching: Shared Attention for Efficient LLMs☆20Updated last year
- ☆48Updated last year
- ☆25Updated 2 months ago