fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆137Updated 4 months ago
Alternatives and similar repositories for FastBinarySearch
Users that are interested in FastBinarySearch are comparing it to the libraries listed below
Sorting:
- High-Performance SGEMM on CUDA devices☆91Updated 3 months ago
- Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.☆78Updated 3 months ago
- LLM training in simple, raw C/CUDA☆95Updated last year
- extensible collectives library in triton☆86Updated last month
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆61Updated 7 months ago
- A minimalistic C++ Jinja templating engine for LLM chat templates☆138Updated last week
- Inference of Mamba models in pure C☆188Updated last year
- GGML implementation of BERT model with Python bindings and quantization.☆56Updated last year
- Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".☆274Updated last year
- FlexAttention w/ FlashAttention3 Support☆26Updated 7 months ago
- Clover: Quantized 4-bit Linear Algebra Library☆112Updated 6 years ago
- 🔶 Compressed bitvector/container supporting efficient random access and rank queries☆43Updated 8 months ago
- Small scale distributed training of sequential deep learning models, built on Numpy and MPI.☆132Updated last year
- ☆13Updated 3 years ago
- Make triton easier☆47Updated 11 months ago
- ☆73Updated 5 months ago
- Gpu benchmark☆61Updated 3 months ago
- ☆85Updated last month
- TileFusion is an experimental C++ macro kernel template library that elevates the abstraction level in CUDA C for tile processing.☆85Updated this week
- ☆159Updated this week
- RWKV-7: Surpassing GPT☆84Updated 5 months ago
- Boosting 4-bit inference kernels with 2:4 Sparsity☆73Updated 8 months ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆109Updated last week
- Experiment of using Tangent to autodiff triton☆78Updated last year
- Explore training for quantized models☆18Updated 4 months ago
- ☆69Updated last month
- A thin, highly portable toolkit for efficiently compiling dense loop-based computation.☆148Updated 2 years ago
- Hydragen: High-Throughput LLM Inference with Shared Prefixes☆36Updated last year
- Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.☆72Updated 3 months ago
- int8_t and int16_t matrix multiply based on https://arxiv.org/abs/1705.01991☆71Updated last year