groq/openbench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/groq/openbench)

groq / openbench

Provider-agnostic, open-source evaluation infrastructure for language models

☆791

Alternatives and similar repositories for openbench

Users that are interested in openbench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

definitive-io / human-eval-sampling-benchmark
View on GitHub
OpenAI's human-eval sampling benchmark
☆13Jan 29, 2024Updated 2 years ago
groq / groq-changelog
View on GitHub
Groq Public Changelog
☆18May 6, 2026Updated 2 months ago
groq / groq-desktop-beta
View on GitHub
Local Groq Desktop chat app with MCP support
☆398Jun 26, 2026Updated 3 weeks ago
groq / realtime-eval
View on GitHub
Realtime News and Information Eval
☆20Jun 26, 2026Updated 3 weeks ago
build-with-groq / compound-voice
View on GitHub
A compound AI voice assistant powered by Compound on Groq, equipped with realtime search capabilities.
☆31Oct 20, 2025Updated 9 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
definitive-io / openassistants
View on GitHub
Build robust, production grade function calling assistants that work. Declarative and extensible. Built on top of LangChain ⚡️
☆76May 21, 2024Updated 2 years ago
UKGovernmentBEIS / inspect_ai
View on GitHub
Inspect: A framework for large language model evaluations
☆2,404Updated this week
definitive-io / code-indexer-loop
View on GitHub
Code Indexer Loop is a Python library for indexing and retrieving source code files through an integrated vector database that's continuo…
☆175Apr 9, 2024Updated 2 years ago
groq / groq-typescript
View on GitHub
The official Node.js / Typescript library for the Groq API
☆259Jul 18, 2026Updated last week
PrimeIntellect-ai / verifiers
View on GitHub
Our library for RL environments + evals
☆4,400Updated this week
build-with-groq / groq-code-cli
View on GitHub
A highly customizable, lightweight, and open-source coding CLI powered by Groq for instant iteration.
☆739Dec 19, 2025Updated 7 months ago
vercel / next-evals-oss
View on GitHub
Evals for Next.js up to 15.5.6 to test AI model competency at Next.js
☆301Updated this week
openai / harmony
View on GitHub
Renderer for the harmony response format to be used with gpt-oss
☆4,465Apr 8, 2026Updated 3 months ago
build-with-groq / groq-subtitle-generator
View on GitHub
Create subtitles in various languages in mere minutes using Whisper and Qwen3-32b via Groq's lightning-fast inference.
☆94Dec 17, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
huggingface / lighteval
View on GitHub
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
☆2,495Jun 29, 2026Updated 3 weeks ago
OpenPipe / ART
View on GitHub
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement…
☆10,529Updated this week
datacurve-ai / pier
View on GitHub
Pier is a Harbor fork built for DeepSWE, with stronger support for CLI agents in air-gapped (no-internet) tasks and more faithful, consis…
☆126Jul 12, 2026Updated last week
groq / compound-mcp-server
View on GitHub
Groq Compound Beta MCP Server
☆53Jun 26, 2026Updated 3 weeks ago
meridianlabs-ai / inspect_flow
View on GitHub
Inspect Flow is a workflow stack built on Inspect AI that enables research organisations to run AI evaluations at scale.
☆16Updated this week
PrimeIntellect-ai / prime-rl
View on GitHub
Agentic RL Training at Scale
☆1,723Updated this week
harbor-framework / harbor
View on GitHub
Framework for evaluating and improving agents
☆3,475Updated this week
MoonshotAI / K2-Vendor-Verifier
View on GitHub
Verify Precision of all Kimi K2 API Vendor
☆580Feb 14, 2026Updated 5 months ago
ygwyg / okiro
View on GitHub
Spawn ephemeral, parallel versions of your codebase to ship faster with AI.
☆53Jan 21, 2026Updated 6 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
stanfordnlp / dspy
View on GitHub
DSPy: The framework for programming—not prompting—language models
☆36,346Updated this week
groq / groq-python
View on GitHub
The official Python Library for the Groq API
☆613Jul 18, 2026Updated last week
groq / groq-autosheet
View on GitHub
A browser spreadsheet with an integrated AI chat (with MCP support) powered by Groq inference
☆32Jul 16, 2026Updated last week
Sakil786 / llama4_trip_planning_agent
View on GitHub
llama4_trip_planning_agent
☆13Apr 5, 2025Updated last year
weaviate / elysia
View on GitHub
Python package and backend for the Elysia platform app.
☆1,924Feb 6, 2026Updated 5 months ago
confident-ai / deepeval
View on GitHub
The LLM Evaluation Framework
☆17,099Updated this week
MoonshotAI / checkpoint-engine
View on GitHub
Checkpoint-engine is a simple middleware to update model weights in LLM inference engines
☆980Jul 4, 2026Updated 2 weeks ago
mlfoundations / evalchemy
View on GitHub
Automatic evals for LLMs
☆600Feb 24, 2026Updated 5 months ago
langwatch / langwatch
View on GitHub
The platform for LLM evaluations and AI agent testing
☆3,416Updated this week
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ax-llm / ax
View on GitHub
The pretty much "official" DSPy framework for Typescript
☆2,842Updated this week
argilla-io / distilabel
View on GitHub
Distilabel is a framework for synthetic data and AI feedback for engineers who need fast, reliable and scalable pipelines based on verifi…
☆3,344Updated this week
NousResearch / atropos
View on GitHub
Atropos is a Language Model Reinforcement Learning Environments framework for collecting and evaluating LLM trajectories through diverse …
☆1,340Jul 4, 2026Updated 2 weeks ago
run-llama / semtools
View on GitHub
Semantic search and document parsing tools for the command line
☆1,838Mar 11, 2026Updated 4 months ago
harbor-framework / terminal-bench
View on GitHub
A benchmark for LLMs on complicated tasks in the terminal
☆2,482Jul 11, 2026Updated 2 weeks ago
algorithmicsuperintelligence / optillm
View on GitHub
Optimizing inference proxy for LLMs
☆4,209Updated this week
NVIDIA-NeMo / DataDesigner
View on GitHub
🎨 NeMo Data Designer: Generate high-quality synthetic data from scratch or from seed data.
☆2,126Updated this week