ytgui / PilotANN
Memory-Bounded GPU Acceleration for Vector Search
☆23Updated 3 weeks ago
Alternatives and similar repositories for PilotANN:
Users that are interested in PilotANN are comparing it to the libraries listed below
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆64Updated last year
- ☆15Updated last year
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆61Updated 6 months ago
- Compression for Foundation Models☆31Updated 3 weeks ago
- A fast header-only graph-based index for approximate nearest neighbor search (ANNS). https://flatnav.net☆20Updated this week
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆77Updated 3 months ago
- ☆45Updated 7 months ago
- Cascade Speculative Drafting☆29Updated last year
- ☆16Updated last month
- LLM reads a paper and produce a working prototype☆52Updated 2 weeks ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Towards LLM Empowered Recommendation via Tool Learning☆15Updated 11 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- ☆13Updated this week
- ☆20Updated 10 months ago
- Very minimal (and stateless) agent framework☆42Updated 3 months ago
- Retrieval with Learned Similarities (http://arxiv.org/abs/2407.15462, WWW'25 Oral)☆41Updated 2 months ago
- Repository for the Q-Filters method (https://arxiv.org/pdf/2503.02812)☆30Updated last month
- Code repository for the public reproduction of the language modelling experiments on "MatFormer: Nested Transformer for Elastic Inference…☆19Updated last year
- Latent Large Language Models☆17Updated 8 months ago
- Official code for "Binary embedding based retrieval at Tencent"☆42Updated last year
- MPI Code Generation through Domain-Specific Language Models☆13Updated 5 months ago
- Official Repository of "GraphTeam: Facilitating Large Language Model-based Graph Analysis via Multi-Agent Collaboration".☆30Updated 3 weeks ago
- ☆28Updated 5 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated 11 months ago
- Using FlexAttention to compute attention with different masking patterns☆43Updated 7 months ago
- XmodelLM☆39Updated 5 months ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 2 months ago
- ☆18Updated 6 months ago
- ☆19Updated 8 months ago