Inference server benchmarking tool
☆160May 26, 2026Updated 2 weeks ago
Alternatives and similar repositories for inference-benchmarker
Users that are interested in inference-benchmarker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆29Dec 17, 2024Updated last year
- Minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig☆14Apr 7, 2025Updated last year
- ☆29May 26, 2026Updated 2 weeks ago
- Prometheus exporter for Linux based GDDR6/GDDR6X VRAM and GPU Core Hot spot temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆26Oct 2, 2024Updated last year
- ☆16Dec 16, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆15Jun 12, 2024Updated last year
- 👷 Build compute kernels☆213Apr 6, 2026Updated 2 months ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 8 months ago
- The backend behind the LLM-Perf Leaderboard☆11May 5, 2024Updated 2 years ago
- Contains the model patches and the eval logs from the passing swe-bench-lite run.☆10Jun 28, 2024Updated last year
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated last year
- Where GPUs get cooked 👩🍳🔥☆395May 26, 2026Updated 2 weeks ago
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.☆670May 26, 2026Updated 2 weeks ago
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆39Aug 29, 2025Updated 9 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Build compute kernels and load them from the Hub.☆676Updated this week
- Clue inspired puzzles for testing LLM deduction abilities☆47Mar 19, 2026Updated 2 months ago
- Rustic bindings to the IREE Compiler/Runtime☆27Aug 18, 2025Updated 9 months ago
- A Lightweight Library for AI Observability☆253Feb 20, 2025Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆337May 26, 2026Updated 2 weeks ago
- A template code for running modular and reproducible experiments in pytorch☆13Sep 3, 2025Updated 9 months ago
- Various LLM Benchmarks☆26Feb 20, 2026Updated 3 months ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- Synthetic data for fine tuning LLM☆27Dec 26, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Alternate Implementation for Zero Shot Text Classification: Instead of reframing NLI/XNLI, this reframes the text backbone of CLIP models…☆37Apr 5, 2022Updated 4 years ago
- ☆12Dec 30, 2020Updated 5 years ago
- RepGhostNetV2: When RepGhost meets MobileNetV4☆16May 29, 2024Updated 2 years ago
- Inference engine for GLiNER models, in Rust☆141Apr 21, 2026Updated last month
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆14Mar 30, 2024Updated 2 years ago
- MindMapper is an innovative program that empowers intelligent agents to navigate complex thought landscapes and collaboratively map their…☆34Mar 25, 2026Updated 2 months ago
- 🦜 Zammad integration into Nextcloud☆26Updated this week
- An in-memory compressed cache for gigabytes of data written in Go.☆19Feb 6, 2023Updated 3 years ago
- ☆31Apr 8, 2026Updated 2 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Transformer LIbrary Docker Stacks☆14Feb 4, 2022Updated 4 years ago
- FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.☆1,082Sep 4, 2024Updated last year
- A C++ generic programming library for machine learning☆12Nov 10, 2025Updated 6 months ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆68Feb 8, 2023Updated 3 years ago
- Cluster doctor skills☆14May 23, 2026Updated 2 weeks ago
- ☆14Dec 1, 2025Updated 6 months ago
- A blazing fast inference solution for text embeddings models☆4,840May 26, 2026Updated 2 weeks ago