benediktstroebl/agent-evals

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/benediktstroebl/agent-evals)

benediktstroebl / agent-evals

☆27

Alternatives and similar repositories for agent-evals

Users that are interested in agent-evals are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

runchu-tian / LongPiBench
View on GitHub
The repository for papaer "Distance between Relevant Information Pieces Causes Bias in Long-Context LLMs"
☆14Dec 16, 2024Updated last year
ben-eysenbach / mnm
View on GitHub
Code to accompany the paper "Mismatched No More: Joint Model-Policy Optimization for Model-Based RL"
☆21Oct 6, 2021Updated 4 years ago
cartesia-ai / dev-showcase
View on GitHub
Developer showcase of projects built on Cartesia
☆20Aug 28, 2024Updated last year
jschuetzke / synthetic-spectra-benchmark
View on GitHub
Benchmarking of 1D pattern classification networks
☆11Jul 19, 2023Updated 3 years ago
flixstn / You-Only-Look-Once
View on GitHub
A Rust implementation of Yolo for object detection and tracking.
☆10Nov 17, 2022Updated 3 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
axiomic-ai / axiomic
View on GitHub
Creating Generative AI Apps which work
☆17Apr 14, 2025Updated last year
qiancheng0 / EscapeBench
View on GitHub
This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box
☆18Dec 19, 2024Updated last year
LachlanGray / lmql-tree-of-thoughts
View on GitHub
LMQL implementation of tree of thoughts
☆36Jan 31, 2024Updated 2 years ago
yxuansu / PlanGen
View on GitHub
[EMNLP'21] Plan-then-Generate: Controlled Data-to-Text Generation via Planning
☆76Jun 15, 2022Updated 4 years ago
nikhilchandak / answer-matching
View on GitHub
Code for 'Answer Matching Outperforms Multiple Choice for Language Model Evaluation' paper
☆18Jul 4, 2025Updated last year
mathpn / llm-docsmith
View on GitHub
Generate Python docstrings automatically with LLM and syntax trees
☆20Jun 13, 2025Updated last year
ratishsp / mlb-data-scripts
View on GitHub
Scripts to create the MLB dataset introduced in the paper Data-to-text Generation with Entity Modeling
☆14Feb 9, 2021Updated 5 years ago
jonathan-roberts1 / SciFIBench
View on GitHub
NeurIPS 2024: SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
☆13May 24, 2025Updated last year
lucasrowe / spoiled
View on GitHub
A Chrome extension that blocks content using any keywords a user specifies.
☆10Jul 16, 2020Updated 6 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
StonyBrookNLP / PerSenT
View on GitHub
[COLING2020] A challenge dataset for Person SenTiment analysis in news domain.
☆11May 2, 2022Updated 4 years ago
intuit-ai-research / DCR-consistency
View on GitHub
DCR-Consistency: Divide-Conquer-Reasoning for Consistency Evaluation and Improvement of Large Language Models
☆27May 23, 2024Updated 2 years ago
swing-research / deepmesh
View on GitHub
Random Mesh Projectors for Inverse Problems
☆23Apr 13, 2021Updated 5 years ago
arviinnd-5989 / Long-Short-Term-Memory-networks-with-Python
View on GitHub
Book by Dr. Jason Brownlee
☆11Sep 14, 2020Updated 5 years ago
motional / motional-prediction-devkit
View on GitHub
☆18Dec 17, 2022Updated 3 years ago
maitchison / PPO
View on GitHub
Example implemention of the Proximal Policy Optimization algorithm
☆18Jul 25, 2024Updated last year
BlueBrain / neuroagent
View on GitHub
LLM agent made to communicate with different neuroscience related tools
☆10Feb 26, 2025Updated last year
neondatabase / toolkit
View on GitHub
☆18Jun 23, 2026Updated last month
node-projects / vs-code-designer-addon
View on GitHub
A VSCode Addon using the web-component-designer
☆16Jun 12, 2026Updated last month
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
anxie / meta_classifier
View on GitHub
☆16Mar 2, 2019Updated 7 years ago
Azure-Samples / iot-hub-nodejs-intel-edison-vibration-anomaly-detection
View on GitHub
Simple IoT project using Azure IoT Hub and showing a device running node to send telemetry data and that is analyzed by Azure IoT service…
☆10Jul 13, 2017Updated 9 years ago
hasura / business-data-benchmark
View on GitHub
Business Data Benchmark (BDB) is a set of real-world questions to evaluate AI systems connected to business data.
☆25Dec 3, 2024Updated last year
uynitsuj / yumi_realtime
View on GitHub
Realtime & high-frequency control interfaces for the YuMi IRB 14000 bi-manual robot arm including manual tele-operation and autonomous Di…
☆27Sep 24, 2025Updated 10 months ago
x35f / model_based_rl
View on GitHub
model based reinforcement learning algorithms for unstable baselines
☆15May 9, 2023Updated 3 years ago
thomasnormal / fewshot
View on GitHub
☆30Oct 24, 2025Updated 9 months ago
space-bacon / Semiotic-Analysis-Tool
View on GitHub
The Semiotic Analysis Tool is a comprehensive and sophisticated Python-based application designed to analyze various sign systems within …
☆20Dec 20, 2025Updated 7 months ago
ratishsp / data2text-seq-plan-py
View on GitHub
Code for TACL 2022 paper on Data-to-text Generation with Variational Sequential Planning
☆22Apr 25, 2022Updated 4 years ago
fsndzomga / open_source_lrm
View on GitHub
☆10Oct 24, 2024Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
kyegomez / dev-swarm
View on GitHub
A swarm of LLM agents that will help you test, document, and productionize your code!
☆19Updated this week
jobedom / obsidian-hemingway-mode
View on GitHub
☆10Aug 6, 2025Updated 11 months ago
lagefreitas / predicting-brazilian-court-decisions
View on GitHub
☆11Jun 10, 2022Updated 4 years ago
AidanTilgner / AutogenObsidianPlugin
View on GitHub
A plugin to use a language model to fill in parts of notes.
☆16Feb 20, 2024Updated 2 years ago
StephLong614 / Causal-disco-LLM-imperfect-experts
View on GitHub
☆17Jun 7, 2024Updated 2 years ago
jaehunjung1 / cascaded-selective-evaluation
View on GitHub
☆29Feb 24, 2025Updated last year
google-deepmind / nonstationary_mbml
View on GitHub
Memory-Based Meta-Learning on Non-Stationary Distributions
☆18Mar 14, 2024Updated 2 years ago