Provider-agnostic, open-source evaluation infrastructure for language models
☆732Dec 24, 2025Updated 2 months ago
Alternatives and similar repositories for openbench
Users that are interested in openbench are comparing it to the libraries listed below
Sorting:
- ☆11Aug 26, 2024Updated last year
- Realtime News and Information Eval☆17Nov 19, 2025Updated 3 months ago
- Build, enrich, and transform datasets using AI models with no code☆1,629Oct 23, 2025Updated 4 months ago
- Groq Compound Beta MCP Server☆44Feb 14, 2026Updated 2 weeks ago
- Inspect: A framework for large language model evaluations☆1,783Updated this week
- TVRecap: A Dataset for Generating Stories with Character Descriptions☆21Jun 5, 2023Updated 2 years ago
- Local Groq Desktop chat app with MCP support☆382Feb 14, 2026Updated 2 weeks ago
- Build robust, production grade function calling assistants that work. Declarative and extensible. Built on top of LangChain ⚡️☆76May 21, 2024Updated last year
- See the device (CPU/GPU/ANE) and estimated cost for every layer in your CoreML model.☆25Oct 23, 2025Updated 4 months ago
- Okra, your all in one personal AI assistant☆14Jun 14, 2024Updated last year
- The LLM Evaluation Framework☆13,787Feb 23, 2026Updated last week
- Renderer for the harmony response format to be used with gpt-oss☆4,205Dec 15, 2025Updated 2 months ago
- Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement…☆8,647Feb 23, 2026Updated last week
- ☆59Updated this week
- ☆23Jan 24, 2025Updated last year
- Composable building blocks to build LLM Apps☆8,275Updated this week
- ☆13Nov 5, 2024Updated last year
- Supporting code for "LLMs for your iPhone: Whole-Tensor 4 Bit Quantization"☆11Mar 31, 2024Updated last year
- Context Engineering Course with DSPy☆215Jul 27, 2025Updated 7 months ago
- Python & JS/TS SDK for running AI-generated code/code interpreting in your AI app☆2,220Updated this week
- Semantic search and document parsing tools for the command line☆1,617Feb 16, 2026Updated last week
- Build production-ready AI agents in both Python and Typescript.☆3,119Feb 20, 2026Updated last week
- Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing a…☆37,083Updated this week
- Everything about the SmolLM and SmolVLM family of models☆3,636Jan 13, 2026Updated last month
- Run evals using LLM☆27Jan 8, 2026Updated last month
- A highly customizable, lightweight, and open-source coding CLI powered by Groq for instant iteration.☆706Dec 19, 2025Updated 2 months ago
- Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends☆2,311Feb 20, 2026Updated last week
- Machine Learning Agility (MLAgility) benchmark and benchmarking tools☆40Jul 31, 2025Updated 7 months ago
- Wave - The Software as a Service Starter Kit, designed to help you build the SAAS of your dreams 🚀 💰☆12Jan 30, 2026Updated last month
- 🌟 Stardex: Explore GitHub Stars Intelligently. Stardex is a powerful web app that lets you search, filter, and cluster any GitHub user's…☆13Jan 30, 2026Updated last month
- This project aims to utilize Generative AI for the next marketing strategy in the case of e-commerce customer segmentation.☆12Mar 19, 2024Updated last year
- Recursive Self-Aggregation evals on ARC-AGI☆28Jan 26, 2026Updated last month
- ☆16May 31, 2025Updated 9 months ago
- Design Tokens synced with Nuxt design team Figma.☆14Aug 8, 2023Updated 2 years ago
- 🧳 A state-of-the-art multi-agent travel planning system powered by OpenAI Agents SDK and LangGraph orchestration. Leverages Stagehand/Pl…☆13May 7, 2025Updated 9 months ago
- streamlit dashboard to analyse data☆12May 6, 2023Updated 2 years ago
- Open-source clone of OpenAI's Deep Research. Works with any transformer, gpt4free, & runs in browser. No Firecrawl needed.☆12Jun 12, 2025Updated 8 months ago
- Python SDK for Modaic☆23Updated this week
- ☆11Oct 11, 2023Updated 2 years ago