benediktstroebl / agent-evalsLinks
☆25Updated 8 months ago
Alternatives and similar repositories for agent-evals
Users that are interested in agent-evals are comparing it to the libraries listed below
Sorting:
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- ☆56Updated last year
- ☆141Updated 4 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Updated 9 months ago
- ☆61Updated 7 months ago
- accompanying material for sleep-time compute paper☆119Updated 9 months ago
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- ☆29Updated last month
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆28Updated last year
- ☆23Updated last year
- Simple repository for training small reasoning models☆49Updated last year
- LLM reads a paper and produce a working prototype☆60Updated 10 months ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated last year
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆69Updated last year
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆62Updated 10 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Updated 2 years ago
- ☆24Updated 11 months ago
- The original Shared Recurrent Memory Transformer implementation☆33Updated 7 months ago
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 9 months ago
- Training Proactive and Personalized LLM Agents☆98Updated 3 weeks ago
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- Understanding the correlation between different LLM benchmarks☆29Updated 2 years ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆112Updated 9 months ago
- ☆67Updated 10 months ago
- ☆35Updated 8 months ago
- UQ: Assessing Language Models on Unsolved Questions☆30Updated 5 months ago
- ☆39Updated last year
- Tree prompting: easy-to-use scikit-learn interface for improved prompting.☆41Updated 2 years ago
- Q-Probe: A Lightweight Approach to Reward Maximization for Language Models☆40Updated last year