Inference server benchmarking tool
☆158Apr 24, 2026Updated 3 weeks ago
Alternatives and similar repositories for inference-benchmarker
Users that are interested in inference-benchmarker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig☆14Apr 7, 2025Updated last year
- Random apps and utilities☆16Mar 1, 2024Updated 2 years ago
- ☆16Dec 16, 2024Updated last year
- ☆15Jun 12, 2024Updated last year
- 👷 Build compute kernels☆213Apr 6, 2026Updated last month
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆13Mar 29, 2024Updated 2 years ago
- A Python wrapper around HuggingFace's TGI (text-generation-inference) and TEI (text-embedding-inference) servers.☆32Sep 19, 2025Updated 8 months ago
- The backend behind the LLM-Perf Leaderboard☆11May 5, 2024Updated 2 years ago
- The home of the Streamlit graph visualization component powered by yFiles for HTML☆19Nov 18, 2025Updated 6 months ago
- Where GPUs get cooked 👩🍳🔥☆389Apr 8, 2026Updated last month
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.☆656Updated this week
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- Official Repo for "SplitQuant / LLM-PQ: Resource-Efficient LLM Offline Serving on Heterogeneous GPUs via Phase-Aware Model Partition and …☆39Aug 29, 2025Updated 8 months ago
- Build compute kernels and load them from the Hub.☆638Updated this week
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆16May 1, 2023Updated 3 years ago
- ANE accelerated embedding models!☆19Dec 11, 2024Updated last year
- Clue inspired puzzles for testing LLM deduction abilities☆47Mar 19, 2026Updated 2 months ago
- Seamlessly integrate marimo reactive notebooks into JupyterLab and JupyterHub☆74Updated this week
- A Lightweight Library for AI Observability☆254Feb 20, 2025Updated last year
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆336Apr 3, 2026Updated last month
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆65Feb 6, 2025Updated last year
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- PyTorch Quantization Framework For OCP MX Datatypes.☆16May 30, 2025Updated 11 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Synthetic data for fine tuning LLM☆27Dec 26, 2024Updated last year
- The DPAB-α Benchmark☆32Jan 15, 2025Updated last year
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38May 24, 2024Updated last year
- ☆12Dec 30, 2020Updated 5 years ago
- [KDD24-ADS] R-Eval: A Unified Toolkit for Evaluating Domain Knowledge of Retrieval Augmented Large Language Models☆11Apr 9, 2024Updated 2 years ago
- Inference engine for GLiNER models, in Rust☆135Apr 21, 2026Updated 3 weeks ago
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆1,039May 13, 2026Updated last week
- Serverless AI Inference with Gemma 2 using Mozilla's llamafile on AWS Lambda☆11Jul 30, 2024Updated last year
- KoboldCpp Smart Launcher with GPU Layer and Tensor Override Tuning☆30May 18, 2025Updated last year
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆47Feb 7, 2024Updated 2 years ago
- Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.☆28Apr 3, 2025Updated last year
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆14Mar 30, 2024Updated 2 years ago
- MindMapper is an innovative program that empowers intelligent agents to navigate complex thought landscapes and collaboratively map their…☆35Mar 25, 2026Updated last month
- Bazel rules for interacting with bazel build artifacts and bringing them into your workspace☆10Jul 24, 2024Updated last year
- ☆10Dec 3, 2024Updated last year
- Model Server Template. Used to expose custom models to the LangSmith Playground☆17Jun 14, 2024Updated last year