kioxia-jp / aisaq-diskannLinks
All-in-Storage Solution based on DiskANN for DRAM-free Approximate Nearest Neighbor Search
☆73Updated 2 months ago
Alternatives and similar repositories for aisaq-diskann
Users that are interested in aisaq-diskann are comparing it to the libraries listed below
Sorting:
- InferX is a Inference Function as a Service Platform☆133Updated this week
- No-code CLI designed for accelerating ONNX workflows☆214Updated 3 months ago
- Lightweight Inference server for OpenVINO☆211Updated this week
- DCPerf benchmark suite for hyperscale cloud applications☆204Updated this week
- Tenstorrent console based hardware information program☆53Updated last week
- Run multiple resource-heavy Large Models (LM) on the same machine with limited amount of VRAM/other resources by exposing them on differe…☆82Updated this week
- The HIP Environment and ROCm Kit - A lightweight open source build system for HIP and ROCm☆379Updated this week
- This is the documentation repository for SGLang. It is auto-generated from https://github.com/sgl-project/sglang/tree/main/docs.☆76Updated this week
- High-speed and easy-use LLM serving framework for local deployment☆118Updated last month
- AI Tensor Engine for ROCm☆276Updated this week
- Lightweight daemon for monitoring CUDA runtime API calls with eBPF uprobes☆128Updated 5 months ago
- A C++ distributed framework for responsive Cloud applications.☆81Updated last month
- AI/GPU flame graph☆184Updated last month
- Rust crates for XetHub☆60Updated 11 months ago
- Bamboo-7B Large Language Model☆93Updated last year
- ☆33Updated 3 weeks ago
- Horizon chart for CPU/GPU/Neural Engine utilization monitoring. Supports Apple M1-M4, Nvidia GPUs, AMD GPUs☆26Updated last month
- Build userspace NVMe drivers and storage applications with CUDA support☆388Updated last year
- A minimalistic C++ Jinja templating engine for LLM chat templates☆180Updated last week
- Inference code for LLaMA models☆42Updated 2 years ago
- DIS: blockDevice over Immutable Storage☆69Updated 3 years ago
- Input your VRAM and RAM and the toolchain will produce a GGUF model tuned to your system within seconds — flexible model sizing and lowes…☆43Updated this week
- Fast block-level file diffs (e.g. for VM disk images) using CoW filesystem metadata☆147Updated 2 months ago
- Wraps any OpenAI API interface as Responses with MCPs support so it supports Codex. Adding any missing stateful features. Ollama and Vllm…☆88Updated 2 months ago
- LLM Inference on consumer devices☆124Updated 6 months ago
- A platform to self-host AI on easy mode☆163Updated this week
- High-performance safetensors model loader☆58Updated 2 months ago
- Transformer GPU VRAM estimator☆66Updated last year
- ☆39Updated last week
- vLLM: A high-throughput and memory-efficient inference and serving engine for LLMs☆89Updated this week