Inference server benchmarking tool
☆145Oct 2, 2025Updated 5 months ago
Alternatives and similar repositories for inference-benchmarker
Users that are interested in inference-benchmarker are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Lightweight Python Wrapper for OpenVINO, enabling LLM inference on NPUs☆27Dec 17, 2024Updated last year
- Minimal implementation of a Byte Pair Encoding (BPE) tokenizer in Zig☆14Apr 7, 2025Updated 11 months ago
- Crossword puzzles in your terminal.☆23Feb 4, 2026Updated last month
- ☆29Nov 18, 2025Updated 4 months ago
- An OpenAI API compatible images server to generate or manipulate images.☆17Feb 2, 2025Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- All-in-one environment to use Dria, the collective knowledge for AI.☆14Mar 15, 2024Updated 2 years ago
- Prometheus exporter for Linux based GDDR6/GDDR6X VRAM and GPU Core Hot spot temperature reader for NVIDIA RTX 3000/4000 series GPUs.☆24Oct 2, 2024Updated last year
- Python SDK for FirstBatch: Real-time personalization using vectorDBs☆17Nov 26, 2023Updated 2 years ago
- ☆17Dec 16, 2024Updated last year
- ☆15Jun 12, 2024Updated last year
- 👷 Build compute kernels☆215Jan 27, 2026Updated 2 months ago
- The backend behind the LLM-Perf Leaderboard☆11May 5, 2024Updated last year
- Building and caching nixpkgs with cudaSupport=true. We push to https://cuda-maintainers.cachix.org/☆23Nov 28, 2024Updated last year
- Personal voice assistant, with voice interruption and Twilio support☆18Feb 24, 2025Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Efficent platform for inference and serving local LLMs including an OpenAI compatible API server.☆624Updated this week
- Leveraging LLMs for modernization through intelligent chunking, iterative prompting and reflection, and retrieval augmented generation (R…☆39Mar 3, 2026Updated 3 weeks ago
- Build compute kernels and load them from the Hub.☆536Updated this week
- Modified Beam Search with periodical restart☆12Sep 12, 2024Updated last year
- Various LLM Benchmarks☆24Feb 20, 2026Updated last month
- QJL: 1-Bit Quantized JL transform for KV Cache Quantization with Zero Overhead☆38Jan 27, 2025Updated last year
- Small tools to enhance your AI app with little effort.☆12Jan 9, 2024Updated 2 years ago
- 🏋️ A unified multi-backend utility for benchmarking Transformers, Timm, PEFT, Diffusers and Sentence-Transformers with full support of O…☆333Sep 25, 2025Updated 6 months ago
- A Lightweight Library for AI Observability☆254Feb 20, 2025Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- A blueprint for AI development, focusing on applied examples of RAG, information extraction, analysis and fine-tuning in the age of LLMs …☆64Feb 6, 2025Updated last year
- Virtualized Elastic KV Cache for Dynamic GPU Sharing and Beyond☆825Updated this week
- ☆25Mar 9, 2026Updated 3 weeks ago
- ☆22Sep 15, 2025Updated 6 months ago
- Go packages useful on windows☆10May 22, 2015Updated 10 years ago
- A template code for running modular and reproducible experiments in pytorch☆13Sep 3, 2025Updated 6 months ago
- A simple no-install web UI for Ollama and OAI-Compatible APIs!☆31Jan 30, 2025Updated last year
- [NeurIPS 2024 Main Track] Code for the paper titled "Instruction Tuning With Loss Over Instructions"☆38May 24, 2024Updated last year
- ☆12Jan 19, 2024Updated 2 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- RepGhostNetV2: When RepGhost meets MobileNetV4☆16May 29, 2024Updated last year
- SGLang kernel library for NPU☆109Updated this week
- Dria SDK is for building and executing synthetic data generation pipelines on Dria Knowledge Network.☆28Apr 3, 2025Updated 11 months ago
- The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.☆14Mar 30, 2024Updated 2 years ago
- Bazel rules for interacting with bazel build artifacts and bringing them into your workspace☆10Jul 24, 2024Updated last year
- ☆10Dec 3, 2024Updated last year
- A git-style way of managing LLM chats☆26Jan 26, 2026Updated 2 months ago