Awesome AI Benchmarks
☆32Jan 16, 2026Updated 4 months ago
Alternatives and similar repositories for awesome-ai-benchmarks
Users that are interested in awesome-ai-benchmarks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- LLM caching proxy server that emulates popular LLMs with the ability to simulate failures☆76Aug 4, 2025Updated 9 months ago
- Code repository for CISO agent as part of ITBench☆20May 8, 2025Updated last year
- Experiments on using ChatGPT for failure mode classification☆12Sep 20, 2023Updated 2 years ago
- Use an appropriate mix of LLMs based on https://nuenki.app/blog research to translate languages better than any one tool.☆27Jun 23, 2025Updated 11 months ago
- Survey paper: From Static Templates to Dynamic Runtime Graphs: A Survey of Workflow Optimization for LLM Agents.☆54Apr 3, 2026Updated last month
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A simple, easy-to-customize pipeline for local RAG evaluation. Starter prompts and metric definitions included.☆24Jan 14, 2026Updated 4 months ago
- Visually select, search, and copy your code into your clipboard for LLM context.☆26May 18, 2025Updated last year
- ☆23Feb 28, 2025Updated last year
- FailureSensorIQ, a dataset and benchmark to probe LLMs’ reasoning and comprehension of sensor–failure relationships in industrial systems…☆43May 19, 2026Updated last week
- [AAAI'25] The implementation of paper "Federated Foundation Models on Heterogeneous Time Series" | The first work to explore time series …☆23May 10, 2026Updated 2 weeks ago
- Analyze Reddit posts☆31Feb 27, 2025Updated last year
- Create text chunks which end at natural stopping points without using a tokenizer☆26Nov 26, 2025Updated 6 months ago
- MLflow deployment plugin For IBM-cloud-watson-ml☆15May 7, 2025Updated last year
- A tool for adding function calling to llm api, available as a service by following the link☆22Aug 11, 2025Updated 9 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ACPBench: Reasoning about Action, Change, and Planning. A benchmark designed to evaluate the fundamental reasoning abilities in the dom…☆33Feb 11, 2026Updated 3 months ago
- Efficient and readable change point detection package implemented in Python. (Singular Spectrum Transformation - SST, IKA-SST, ulSIF, RuL…☆35May 12, 2026Updated 2 weeks ago
- Code repository for SRE agent as part of ITBench☆19Sep 9, 2025Updated 8 months ago
- Lightning fast code searching made easy☆18Jul 20, 2024Updated last year
- Emulating SAMSUNG HM641JI HDD firmware using Unicorn☆11Sep 19, 2022Updated 3 years ago
- [KDD 2025] AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation☆34Nov 18, 2025Updated 6 months ago
- A bit-array manipulation library in C☆11Oct 29, 2021Updated 4 years ago
- ChatGPT CSS style☆14Apr 28, 2024Updated 2 years ago
- The accompany backend for PAI app☆12Mar 24, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- PoC of injecting code into a running Linux process☆22Sep 11, 2019Updated 6 years ago
- ☆12May 30, 2025Updated 11 months ago
- A bot that provides Youtube vid chapters on Twitter (a.k.a. X )☆12Feb 5, 2025Updated last year
- Wallaby create-react-app TypeScript☆11Jan 3, 2023Updated 3 years ago
- In-Situ Evaluator: Real-Time Subsample Analysis☆15Jan 25, 2026Updated 4 months ago
- Better Encrypted Datastore is a library for securely storing encrypted data inside Datastore. In addition, the library extends Datastore'…☆13Mar 23, 2025Updated last year
- llm-eval-simple is a simple LLM evaluation framework with intermediate actions and prompt pattern selection☆68Feb 28, 2026Updated 3 months ago
- Experiments with compile-time metaprogramming☆11Dec 29, 2025Updated 5 months ago
- A privacy-focused, censorship-resistant multinet Android radio player built with Claude Code. Supports I2P, clearnet and Tor streaming.☆55May 11, 2026Updated 2 weeks ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- A file dialog library for Dear ImGui☆14May 12, 2026Updated 2 weeks ago
- Wrapper around Ghidra's analyzeHeadless script☆13Feb 5, 2022Updated 4 years ago
- ☆16Feb 1, 2025Updated last year
- A web site for uploading and sharing solutions for the game TIS-100.☆13Jun 22, 2025Updated 11 months ago
- ☆19Jun 11, 2025Updated 11 months ago
- A toy for exploring arbitrary MAP rules (life-like rules, isotropic rules and so on)☆15Dec 11, 2025Updated 5 months ago
- A rust FTP client implementation.☆14Jul 13, 2024Updated last year