ytgui / PilotANNLinks
Memory-Bounded GPU Acceleration for Vector Search
☆28Updated 5 months ago
Alternatives and similar repositories for PilotANN
Users that are interested in PilotANN are comparing it to the libraries listed below
Sorting:
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated last year
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆64Updated 11 months ago
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆82Updated 7 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆24Updated last year
- Compression for Foundation Models☆35Updated last month
- Graph Library for Approximate Similarity Search☆129Updated last month
- A library of algorithms for approximate nearest neighbor search in high dimensions, along with a set of useful tools for designing such a…☆158Updated this week
- Official code for "Binary embedding based retrieval at Tencent"☆43Updated last year
- ☆55Updated 3 months ago
- [VLDB 25] Maximum Inner Product is Query-Scaled Nearest Neighbor☆30Updated 3 months ago
- Collection of datasets for benchmarking filtered vector similarity retrieval☆49Updated 2 months ago
- ☆38Updated last year
- Samples of good AI generated CUDA kernels☆89Updated 3 months ago
- MSVBASE is a system that efficiently supports complex queries of both approximate similarity search and relational operators. It integrat…☆97Updated 9 months ago
- Bamboo-7B Large Language Model☆93Updated last year
- Faster Learned Sparse Retrieval with Block-Max Pruning. ACM SIGIR 2024.☆31Updated last month
- ⚡ Faster similarity search with PDX: A vertical data layout for vectors☆54Updated 3 weeks ago
- A fast header-only graph-based index for approximate nearest neighbor search (ANNS). https://flatnav.net☆35Updated 2 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆155Updated 4 months ago
- ☆41Updated 4 months ago
- FlexAttention w/ FlashAttention3 Support☆27Updated 10 months ago
- Scalable long-context LLM decoding that leverages sparsity—by treating the KV cache as a vector storage system.☆77Updated 2 weeks ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆49Updated 5 months ago
- KV Cache Steering for Inducing Reasoning in Small Language Models☆39Updated last month
- CUDA implementation of Hierarchical Navigable Small World Graph algorithm☆163Updated 4 years ago
- ☆22Updated 4 months ago
- Cascade Speculative Drafting☆29Updated last year
- Official software repository of L. Delfino, D. Erriquez, S. Martinico, F. M. Nardini, C. Rulli, and R. Venturini. "kANNolo: Sweet and Smo…☆41Updated 2 weeks ago
- Modular and structured prompt caching for low-latency LLM inference☆99Updated 9 months ago
- ☆18Updated 2 weeks ago