☆27May 28, 2025Updated last year
Alternatives and similar repositories for agent-evals
Users that are interested in agent-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆10Oct 22, 2024Updated last year
- The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"☆14Dec 16, 2024Updated last year
- Implementation of 12 AI agents evaluation techniques☆43Jul 31, 2025Updated 11 months ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆28Mar 6, 2024Updated 2 years ago
- [NAACL'25] "Revealing the Barriers of Language Agents in Planning"☆13Jun 22, 2025Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Benchmarking of 1D pattern classification networks☆11Jul 19, 2023Updated 2 years ago
- ☆28May 15, 2024Updated 2 years ago
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆149Nov 26, 2024Updated last year
- Example code using the DSPy framework.☆20May 30, 2024Updated 2 years ago
- Creating Generative AI Apps which work☆17Apr 14, 2025Updated last year
- A Declarative Language for Expressing Partial World Knowledge to Reinforcement Learning Agents☆17Jan 19, 2024Updated 2 years ago
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆18Dec 19, 2024Updated last year
- ☆11Jun 11, 2024Updated 2 years ago
- Scripts to create the MLB dataset introduced in the paper Data-to-text Generation with Entity Modeling☆14Feb 9, 2021Updated 5 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- ☆13Oct 11, 2023Updated 2 years ago
- A dataset for training and evaluating LLMs on decision making about "when (not) to call" functions☆65Apr 29, 2025Updated last year
- LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding (CVPR 2023)☆48Apr 28, 2023Updated 3 years ago
- Official repository of the paper MPMQA: Multimodal Question Answering on Product Manuals (AAAI 2023)☆21Nov 28, 2022Updated 3 years ago
- Random Mesh Projectors for Inverse Problems☆23Apr 13, 2021Updated 5 years ago
- ☆18Jul 15, 2019Updated 6 years ago
- ☆15Mar 26, 2024Updated 2 years ago
- Prediction of box office success using Google Trends data☆11Dec 5, 2019Updated 6 years ago
- Programming by Demonstration for Fetch☆16Aug 8, 2017Updated 8 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Check if a program only uses a subset of the Python language.☆17Nov 17, 2025Updated 7 months ago
- Demo for Neuro-Symbolic Agent (LOA)☆17Sep 27, 2022Updated 3 years ago
- Example implemention of the Proximal Policy Optimization algorithm☆18Jul 25, 2024Updated last year
- ☆16Mar 2, 2019Updated 7 years ago
- ☆21Apr 29, 2020Updated 6 years ago
- Enterprise-grade Rust implementation of Anthropic's MCP protocol☆46May 16, 2026Updated last month
- Business Data Benchmark (BDB) is a set of real-world questions to evaluate AI systems connected to business data.☆24Dec 3, 2024Updated last year
- Code for TACL 2022 paper on Data-to-text Generation with Variational Sequential Planning☆22Apr 25, 2022Updated 4 years ago
- Penpot Copilot is your AI-powered design assistant, revolutionizing the way you create.☆16Dec 16, 2024Updated last year
- GPUs on demand by Runpod - Special Offer Available • AdRun AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
- A repository of Juris-M style modules☆16Jan 17, 2024Updated 2 years ago
- GPT for FACodec☆13Mar 25, 2024Updated 2 years ago
- ☆18Dec 17, 2022Updated 3 years ago
- ☆14Jul 18, 2024Updated last year
- ☆17Jun 7, 2024Updated 2 years ago
- Elora is an enchanting component library built in TypeScript for React projects.☆15Oct 9, 2021Updated 4 years ago
- Dialogue-based generation of self-driving simulation scenarios using Large Language Models☆14Oct 13, 2024Updated last year