benediktstroebl / agent-evalsLinks
☆25Updated 8 months ago
Alternatives and similar repositories for agent-evals
Users that are interested in agent-evals are comparing it to the libraries listed below
Sorting:
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆61Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆35Updated 9 months ago
- EMNLP 2024 "Re-reading improves reasoning in large language models". Simply repeating the question to get bidirectional understanding for…☆28Updated last year
- ☆29Updated last month
- ☆56Updated last year
- Code and data for the paper "Why think step by step? Reasoning emerges from the locality of experience"☆62Updated 10 months ago
- Source code for the collaborative reasoner research project at Meta FAIR.☆112Updated 9 months ago
- The code implementation of MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models…☆40Updated 2 years ago
- SCREWS: A Modular Framework for Reasoning with Revisions☆27Updated 2 years ago
- Codebase accompanying the Summary of a Haystack paper.☆80Updated last year
- ☆61Updated 7 months ago
- ☆141Updated 4 months ago
- The original Shared Recurrent Memory Transformer implementation☆33Updated 7 months ago
- UQ: Assessing Language Models on Unsolved Questions☆30Updated 5 months ago
- LLM reads a paper and produce a working prototype☆60Updated 10 months ago
- ☆67Updated 10 months ago
- QAlign is a new test-time alignment approach that improves language model performance by using Markov chain Monte Carlo methods.☆26Updated last month
- Learning to Retrieve by Trying - Source code for Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval☆51Updated last year
- A framework for pitting LLMs against each other in an evolving library of games ⚔☆34Updated 9 months ago
- ☆23Updated last year
- Official repository for "BLEUBERI: BLEU is a surprisingly effective reward for instruction following"☆31Updated 8 months ago
- LLM Dynamic Planner - Combining LLM with PDDL Planners to solve an embodied task☆48Updated last year
- ReBase: Training Task Experts through Retrieval Based Distillation☆29Updated last year
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆65Updated last year
- Simple repository for training small reasoning models☆49Updated last year
- Simple GRPO scripts and configurations.☆59Updated last year
- ☆22Updated 11 months ago
- ☆41Updated last year
- ☆54Updated 3 weeks ago
- Verifiers for LLM Reinforcement Learning☆80Updated 9 months ago