ytgui / PilotANNLinks
Memory-Bounded GPU Acceleration for Vector Search
☆27Updated 4 months ago
Alternatives and similar repositories for PilotANN
Users that are interested in PilotANN are comparing it to the libraries listed below
Sorting:
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆65Updated last year
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆81Updated 6 months ago
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆63Updated 10 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated 4 months ago
- Compression for Foundation Models☆34Updated 2 weeks ago
- Port of Facebook's LLaMA model in C/C++☆22Updated last year
- ☆40Updated 3 months ago
- ☆54Updated 2 months ago
- A public implementation of the ReLoRA pretraining method, built on Lightning-AI's Pytorch Lightning suite.☆33Updated last year
- Nexusflow function call, tool use, and agent benchmarks.☆27Updated 7 months ago
- XTR/WARP (SIGIR'25) is an extremely fast and accurate retrieval engine based on Stanford's ColBERTv2/PLAID and Google DeepMind's XTR.☆152Updated 3 months ago
- PipeRAG: Fast Retrieval-Augmented Generation via Algorithm-System Co-design (KDD 2025)☆22Updated last year
- Lightweight toolkit package to train and fine-tune 1.58bit Language models☆82Updated 2 months ago
- ☆20Updated 3 months ago
- ☆19Updated 5 months ago
- Optimizing Causal LMs through GRPO with weighted reward functions and automated hyperparameter tuning using Optuna☆55Updated 6 months ago
- Samples of good AI generated CUDA kernels☆86Updated 2 months ago
- Latent Large Language Models☆18Updated 11 months ago
- Repository for Sparse Finetuning of LLMs via modified version of the MosaicML llmfoundry☆42Updated last year
- KV Cache Steering for Inducing Reasoning in Small Language Models☆36Updated 2 weeks ago
- Self-host LLMs with LMDeploy and BentoML☆22Updated last month
- Official code for "Binary embedding based retrieval at Tencent"☆43Updated last year
- code for training and using chess embeddings models☆12Updated last year
- Cascade Speculative Drafting☆29Updated last year
- MPI Code Generation through Domain-Specific Language Models☆14Updated 8 months ago
- Repo hosting codes and materials related to speeding LLMs' inference using token merging.☆36Updated 2 weeks ago
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆81Updated this week
- [VLDB 25] Maximum Inner Product is Query-Scaled Nearest Neighbor☆29Updated 2 months ago
- Simple high-throughput inference library☆125Updated 2 months ago
- Very minimal (and stateless) agent framework☆45Updated 6 months ago