☆25May 28, 2025Updated 9 months ago
Alternatives and similar repositories for agent-evals
Users that are interested in agent-evals are comparing it to the libraries listed below
Sorting:
- A Declarative Language for Expressing Partial World Knowledge to Reinforcement Learning Agents☆16Jan 19, 2024Updated 2 years ago
- ☆10Oct 22, 2024Updated last year
- Creating Generative AI Apps which work☆17Apr 14, 2025Updated 10 months ago
- Spatial Spectral Machine Learning☆14Oct 15, 2025Updated 4 months ago
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆18Dec 19, 2024Updated last year
- Developer showcase of projects built on Cartesia☆20Aug 28, 2024Updated last year
- Code to accompany the paper "Mismatched No More: Joint Model-Policy Optimization for Model-Based RL"☆21Oct 6, 2021Updated 4 years ago
- Exploration using DSPy to optimize modules to maximize performance on the OpenToM dataset☆27Mar 6, 2024Updated last year
- ☆26May 15, 2024Updated last year
- Random Mesh Projectors for Inverse Problems☆24Apr 13, 2021Updated 4 years ago
- ☆29Feb 24, 2025Updated last year
- This tool allows local LLM usage that can automate tasks without human interventention. The agent can call itself recursively and work on…☆20May 5, 2025Updated 9 months ago
- Code for the paper "Trust the PRoC3S: Solving Long-Horizon Robotics Problems with LLMs and Constraint Satisfaction" presented at CoRL 202…☆31Nov 18, 2024Updated last year
- ☆29Oct 24, 2025Updated 4 months ago
- Code and data for "Retrieval Enhanced Model for Commonsense Generation" (ACL-IJCNLP 2021).☆29Dec 31, 2021Updated 4 years ago
- LMQL implementation of tree of thoughts☆36Jan 31, 2024Updated 2 years ago
- code associated with WANLI dataset in Liu et al., 2022☆31May 24, 2023Updated 2 years ago
- Repository containing starters templates to be used within Kodu☆15Sep 26, 2024Updated last year
- ☆14Jun 24, 2024Updated last year
- ☆37May 5, 2025Updated 9 months ago
- Leveraging Base Language Models for Few-Shot Synthetic Data Generation☆40Oct 18, 2025Updated 4 months ago
- Implementation of "Arc-swift: A Novel Transition System for Dependency Parsing"☆33Aug 21, 2018Updated 7 years ago
- A dataset for training and evaluating LLMs on decision making about "when (not) to call" functions☆55Apr 29, 2025Updated 10 months ago
- This is the repository holding code and data for "FrugalML: How to Use ML Prediction APIs More Accurately and Cheaply".☆36Jan 12, 2021Updated 5 years ago
- Enhanced Explainable Neural Network☆10Dec 25, 2021Updated 4 years ago
- A python library for making API calls to Bonsai BRAIN.☆15Oct 6, 2022Updated 3 years ago
- Code for the paper "SMACE: A New Method for the Interpretability of Composite Decision Systems", ECML 2022☆15Apr 17, 2023Updated 2 years ago
- ☆27Updated this week
- Feature space plugin for QGIS. Visualize spectral feature space☆10Jul 1, 2023Updated 2 years ago
- Benchmarking of 1D pattern classification networks☆10Jul 19, 2023Updated 2 years ago
- Code for the paper "Distinguishing the Knowable from the Unknowable with Language Models"☆11Apr 15, 2024Updated last year
- An extensible MINLP solver☆46Feb 5, 2026Updated 3 weeks ago
- ☆11Aug 27, 2024Updated last year
- ☆10Feb 22, 2025Updated last year
- This is a web app that forecasts the High and Low of the stocks, built using streamlit, and based on statsmodels, sklearn, and sktime.☆12Nov 19, 2021Updated 4 years ago
- ☆12Sep 24, 2025Updated 5 months ago
- Official repo for paper "HiMoE-VLA: Hierarchical Mixture-of-Experts for Generalist Vision-Language-Action Policies"☆23Dec 12, 2025Updated 2 months ago
- The AI Alliance project to define a reference stack for AI model and system evaluation, with evaluations, benchmarks, and leaderboards.☆13Feb 15, 2026Updated 2 weeks ago
- Official implementation of SPGrasp: A framework for dynamic grasp synthesis from sparse spatiotemporal prompts.☆19Jan 6, 2026Updated last month