fabiocannizzo / FastBinarySearchLinks

Fast and vectorizable algorithms for searching in a vector of sorted floating point numbers

☆152

Alternatives and similar repositories for FastBinarySearch

Users that are interested in FastBinarySearch are comparing it to the libraries listed below

Sorting:

gevtushenko / llm.c
LLM training in simple, raw C/CUDA
☆108Updated last year
salykova / sgemm.cu
High-Performance SGEMM on CUDA devices
☆112Updated 10 months ago
nunoplopes / torchy
A tracing JIT compiler for PyTorch
☆13Updated 3 years ago
kroggen / mamba.c
Inference of Mamba models in pure C
☆192Updated last year
facebookresearch / vector_db_id_compression
Implementation of the paper "Lossless Compression of Vector IDs for Approximate Nearest Neighbor Search" by Severo et al.
☆82Updated 10 months ago
google / minja
A minimalistic C++ Jinja templating engine for LLM chat templates
☆197Updated 2 months ago
facebookresearch / FAMBench
Benchmarks to capture important workloads.
☆31Updated 9 months ago
jax-ml / ml_dtypes
A stand-alone implementation of several NumPy dtype extensions used in machine learning.
☆308Updated this week
facebookresearch / fastgen
Simple high-throughput inference library
☆149Updated 6 months ago
nod-ai / SRT
Nod.ai 🦈 version of 👻 . You probably want to start at https://github.com/nod-ai/shark for the product and the upstream IREE repository …
☆106Updated 10 months ago
CosimoRulli / emvb
Implementation of "Efficient Multi-vector Dense Retrieval with Bit Vectors", ECIR 2024
☆66Updated last month
deepspeedai / DeepSpeed-Kernels
☆71Updated 7 months ago
HazyResearch / HipKittens
Fast and Furious AMD Kernels
☆278Updated this week
CentML / DeepView.Profile
🏙 Interactive performance profiling and debugging tool for PyTorch neural networks.
☆64Updated 10 months ago
UmerHA / triton_util
Make triton easier
☆48Updated last year
amd / ZenDNN
☆126Updated last week
GindaChen / FlexFlashAttention3
FlexAttention w/ FlashAttention3 Support
☆27Updated last year
BlinkDL / fast.c
Prepare for DeekSeek R1 inference: Benchmark CPU, DRAM, SSD, iGPU, GPU, ... with efficient code.
☆73Updated 9 months ago
NVIDIA / free-threaded-python
No-GIL Python environment featuring NVIDIA Deep Learning libraries.
☆68Updated 7 months ago
mag- / gpu_benchmark
Gpu benchmark
☆72Updated 9 months ago
nod-ai / transformer-benchmarks
benchmarking some transformer deployments
☆26Updated 2 years ago
ashvardanian / PyBindToGPUs
Parallel Computing starter project to build GPU & CPU kernels in CUDA & C++ and call them from Python without a single line of CMake usin…
☆30Updated last month
eth-easl / mixtera
A lightweight, user-friendly data-plane for LLM training.
☆37Updated 2 months ago
IST-DASLab / qmoe
Code for the paper "QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models".
☆278Updated 2 years ago
ScalingIntelligence / good-kernels
Samples of good AI generated CUDA kernels
☆91Updated 5 months ago
astojanov / Clover
Clover: Quantized 4-bit Linear Algebra Library
☆114Updated 7 years ago
hpcaitech / TensorNVMe
A Python library transfers PyTorch tensors between CPU and NVMe
☆121Updated 11 months ago
apple / ml-recurrent-drafter
☆218Updated 9 months ago
mikex86 / tritonc
Standalone commandline CLI tool for compiling Triton kernels
☆20Updated last year
softmax1 / Flash-Attention-Softmax-N
CUDA and Triton implementations of Flash Attention with SoftmaxN.
☆73Updated last year