fabiocannizzo / FastBinarySearch
Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers
☆127Updated last month
Alternatives and similar repositories for FastBinarySearch:
Users that are interested in FastBinarySearch are comparing it to the libraries listed below
- SGEMM that beats cuBLAS☆68Updated last week
- A stand-alone implementation of several NumPy dtype extensions used in machine learning.☆240Updated this week
- Explore training for quantized models☆13Updated 3 weeks ago
- CUDA implementation of Hierarchical Navigable Small World Graph algorithm☆150Updated 3 years ago
- A safetensors extension to efficiently store sparse quantized tensors on disk☆66Updated this week
- Inference of Mamba models in pure C☆183Updated 11 months ago
- LLM training in simple, raw C/CUDA☆91Updated 8 months ago
- Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024☆59Updated 3 months ago
- A Python library transfers PyTorch tensors between CPU and NVMe☆102Updated 2 months ago
- A minimal implementation of vllm.☆33Updated 6 months ago
- ☆63Updated 2 months ago
- extensible collectives library in triton☆77Updated 4 months ago
- Clover: Quantized 4-bit Linear Algebra Library☆111Updated 6 years ago
- FlexAttention w/ FlashAttention3 Support☆27Updated 3 months ago
- ☆12Updated 3 years ago
- Fast low-bit matmul kernels in Triton☆199Updated last week
- If only std::set was a DBMS: collection of templated ACID in-memory exception-free thread-safe and concurrent containers in a header-only…☆37Updated last year
- ☆58Updated 8 months ago
- ☆171Updated last week
- Boosting 4-bit inference kernels with 2:4 Sparsity☆64Updated 4 months ago
- ☆64Updated 2 months ago
- ☆279Updated last week
- Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters☆113Updated last month
- Memory Optimizations for Deep Learning (ICML 2023)☆62Updated 10 months ago
- ☆79Updated 2 weeks ago
- Make triton easier☆44Updated 7 months ago
- Lightning In-Memory Object Store☆44Updated 3 years ago
- GGML implementation of BERT model with Python bindings and quantization.☆53Updated 11 months ago
- Massively Parallel Huffman Decoding on GPUs☆47Updated 5 years ago