fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆132Updated 3 months ago
Alternatives and similar repositories for FastBinarySearch:
Users that are interested in FastBinarySearch are comparing it to the libraries listed below
- LLM training in simple, raw C/CUDA☆92Updated 10 months ago
- High-Performance SGEMM on CUDA devices☆87Updated 2 months ago
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆74Updated 2 months ago
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆60Updated 5 months ago
- Inference of Mamba models in pure C☆186Updated last year
- asynchronous/distributed speculative evaluation for llama3☆39Updated 7 months ago
- FlexAttention w/ FlashAttention3 Support☆26Updated 5 months ago
- Make triton easier☆47Updated 9 months ago
- ☆81Updated last week
- CUDA implementation of Hierarchical Navigable Small World Graph algorithm☆155Updated 3 years ago
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆255Updated this week
- 🔶 Compressed bitvector/container supporting efficient random access and rank queries☆43Updated 6 months ago
- ☆62Updated last month
- Official repository of kANNolo.☆26Updated 4 months ago
- ☆73Updated 4 months ago
- extensible collectives library in triton☆84Updated 6 months ago
- This small library enables acceleration of bulk calls of certain math functions on AVX and AVX2 hardware. Currently supported operations …☆88Updated 3 years ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆111Updated 4 months ago
- No-GIL Python environment featuring NVIDIA Deep Learning libraries.☆53Updated 3 weeks ago
- Clover: Quantized 4-bit Linear Algebra Library☆112Updated 6 years ago
- Standalone commandline CLI tool for compiling Triton kernels☆17Updated 6 months ago
- Lightweight Llama 3 8B Inference Engine in CUDA C☆47Updated this week
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆272Updated last year
- A SIMD-based C++ library providing rank/select queries over mutable bitmaps.☆35Updated 2 years ago
- Port of Facebook's LLaMA model in C/C++☆20Updated last year
- A C++ library providing fast language model queries in compressed space.☆129Updated 2 years ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆35Updated 10 months ago
- Common Index File Format to to support interoperability between open-source IR engines☆32Updated 6 months ago
- Reference Kernels for the Leaderboard☆23Updated 3 weeks ago