benediktstroebl / agent-evalsLinks
☆25Updated 5 months ago
Alternatives and similar repositories for agent-evals
Users that are interested in agent-evals are comparing it to the libraries listed below
Sorting:
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆60Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 7 months ago
- ☆73Updated last month
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- LLM reads a paper and produce a working prototype☆57Updated 7 months ago
- ☆77Updated last week
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- ☆55Updated last year
- Matrix (Multi-Agent daTa geneRation Infra and eXperimentation framework) is a versatile engine for multi-agent conversational data genera…☆100Updated last week
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated 9 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆27Updated 11 months ago
- ☆60Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆33Updated 6 months ago
- Understanding the correlation between different LLM benchmarks☆29Updated last year
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆37Updated last year
- Simple repository for training small reasoning models☆45Updated 9 months ago
- Source code of "How to Correctly do Semantic Backpropagation on Language-based Agentic Systems" 🤖☆76Updated 11 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆105Updated 7 months ago
- ☆29Updated 3 months ago
- Simple GRPO scripts and configurations.☆59Updated 9 months ago
- ☆24Updated last year
- Verifiers for LLM Reinforcement Learning☆79Updated 7 months ago
- Official Code Release for "Training a Generally Curious Agent"☆38Updated 5 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆63Updated 11 months ago
- accompanying material for sleep-time compute paper☆117Updated 6 months ago
- ☆27Updated last year
- Synthetic data generation and benchmark implementation for "Episodic Memories Generation and Evaluation Benchmark for Large Language Mode…☆56Updated last month
- Automatic Prompt Optimization☆46Updated last year
- ☆67Updated 7 months ago
- ☆51Updated last year