fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆128Updated last month
Alternatives and similar repositories for FastBinarySearch:
Users that are interested in FastBinarySearch are comparing it to the libraries listed below
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆59Updated 4 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆43Updated this week
- High-Performance SGEMM on CUDA devices☆73Updated 3 weeks ago
- LLM training in simple, raw C/CUDA☆91Updated 9 months ago
- GGML implementation of BERT model with Python bindings and quantization.☆53Updated 11 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆45Updated this week
- FlexAttention w/ FlashAttention3 Support☆26Updated 4 months ago
- Benchmarks to capture important workloads.☆29Updated 2 weeks ago
- Official repository of kANNolo.☆21Updated 2 months ago
- ☆12Updated 3 years ago
- RWKV-7: Surpassing GPT☆76Updated 2 months ago
- asynchronous/distributed speculative evaluation for llama3☆37Updated 6 months ago
- extensible collectives library in triton☆82Updated 4 months ago
- 🔶 Compressed bitvector/container supporting efficient random access and rank queries☆43Updated 5 months ago
- Make triton easier☆43Updated 8 months ago
- A parallel framework for training deep neural networks☆51Updated 3 weeks ago
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆114Updated 2 months ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆33Updated 9 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 5 months ago
- Learning about CUDA by writing PTX code.☆35Updated 11 months ago
- CUDA implementation of Hierarchical Navigable Small World Graph algorithm☆153Updated 3 years ago
- A tracing JIT compiler for PyTorch☆12Updated 3 years ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆120Updated this week
- Official code for "Binary embedding based retrieval at Tencent"☆42Updated 11 months ago
- Code repository for the paper - "AdANNS: A Framework for Adaptive Semantic Search"☆62Updated last year
- Example ML projects that use the Determined library.☆26Updated 5 months ago
- ☆64Updated 2 months ago
- An experimental CPU backend for Triton (https//github.com/openai/triton)☆38Updated 9 months ago