Compendium of over 50 benchmarks for evaluating AI agents, categorized into Function Calling & Tool Use, General Assistant & Reasoning, Coding & Software Engineering, and Computer Interaction.
☆160Oct 15, 2025Updated 7 months ago
Alternatives and similar repositories for ai-agent-benchmark-compendium
Users that are interested in ai-agent-benchmark-compendium are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆65Feb 6, 2026Updated 4 months ago
- powerful and fast tool calling agents☆78Mar 19, 2025Updated last year
- Implementation of the Tower Method, a novel approach to handling missing values.☆13Mar 12, 2024Updated 2 years ago
- Skills for Cloud SQL for PostgreSQL☆39Jun 2, 2026Updated last week
- CodebaseMD: A VS Code extension that converts codebases into structured Markdown documentation, optimized for LLMs and agentic coding too…☆15May 22, 2025Updated last year
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- An introduction to DSPy☆34Aug 30, 2025Updated 9 months ago
- ☆28Jun 5, 2026Updated last week
- Snowflake LLM-based text to SQL and document retrieval in Streamlit☆45Nov 16, 2023Updated 2 years ago
- ☆18Feb 15, 2025Updated last year
- ☆20Aug 6, 2025Updated 10 months ago
- Official Documentation for DSPy Library☆24Jun 5, 2026Updated last week
- A command line utility to locally index and download filings from the SEC Edgar database.☆13Feb 18, 2025Updated last year
- R package for data preprocessing☆13Dec 18, 2019Updated 6 years ago
- A quick fix model for the Charm BubbleTea ecosystem.☆16Nov 27, 2025Updated 6 months ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- ☆18Jul 20, 2025Updated 10 months ago
- Minimal event-driven framework for Java.☆15Apr 14, 2019Updated 7 years ago
- R package for online training of regression models using FTRL Proximal☆12May 18, 2026Updated 3 weeks ago
- Enhanced assignments. Use `..` on the right hand side as a shorthand for the left hand side.☆17Mar 23, 2019Updated 7 years ago
- Convert unstructured text into structured datasets☆27Apr 15, 2026Updated last month
- ☆41Jan 19, 2026Updated 4 months ago
- Data Analytics skills for BigQuery☆44Jun 1, 2026Updated last week
- List of Papers on Attack and Defense (AD) in AI Models☆27Mar 18, 2022Updated 4 years ago
- A self-hosted, secure, feature-rich memory system for AI agents and assistants. Provides intelligent fact extraction and deduplication, w…☆148Updated this week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- themes for base plots☆15May 15, 2018Updated 8 years ago
- Advanced Text2SQL with LlamaIndex and Snowflake models☆43Oct 9, 2025Updated 8 months ago
- Shift from passive documentation to active enforcement.☆57Mar 14, 2026Updated 2 months ago
- Example implementations of Claude's Memory Tool API - Next.js web app and Python CLI for building applications with persistent memory☆53Oct 14, 2025Updated 8 months ago
- Graph Neural Network application in predicting AC Power Flow calculation. Developed with Pytorch Geometric framework. My work at NCSU for…☆14Dec 11, 2024Updated last year
- zkCREAM is zk-SNARK based anonymized voting application using a token mixer☆39Feb 18, 2022Updated 4 years ago
- General Utilities☆57Apr 18, 2026Updated last month
- ☆26Jun 12, 2025Updated last year
- Jigsawstack Python SDK☆20Jun 3, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Package Information and Documentation☆16Nov 10, 2022Updated 3 years ago
- Chain-of-thought 방식을 활용 하여 llama2를 fine-tuning☆10Nov 18, 2023Updated 2 years ago
- Let Claude Code and Codex control your browser☆30Aug 30, 2025Updated 9 months ago
- [student project] UI to run SQL on Delta Lake tables and visualize the variations of the result among tables versions☆12Apr 21, 2020Updated 6 years ago
- ☆19May 15, 2024Updated 2 years ago
- MCP prompt tool applying Chain-of-Draft (CoD) reasoning - BYOLLM☆19Sep 8, 2025Updated 9 months ago
- ☆11Nov 29, 2017Updated 8 years ago