JoshuaPurtell / SmallBench
Small, simple agent task environments for training and evaluation
☆18Updated 4 months ago
Alternatives and similar repositories for SmallBench:
Users that are interested in SmallBench are comparing it to the libraries listed below
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆54Updated 6 months ago
- ☆48Updated 3 months ago
- Data preparation code for CrystalCoder 7B LLM☆44Updated 9 months ago
- ☆59Updated 10 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆21Updated 3 months ago
- Functional Benchmarks and the Reasoning Gap☆84Updated 5 months ago
- ☆14Updated last month
- NeurIPS 2023 - Cappy: Outperforming and Boosting Large Multi-Task LMs with a Small Scorer☆41Updated 11 months ago
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆49Updated 2 months ago
- ☆48Updated last year
- ☆38Updated 7 months ago
- Code for the paper: CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models☆14Updated last month
- ☆27Updated 3 months ago
- ☆31Updated 8 months ago
- Script for processing OpenAI's PRM800K process supervision dataset into an Alpaca-style instruction-response format☆27Updated last year
- The repository contains code for Adaptive Data Optimization☆20Updated 2 months ago
- Repository containing the SPIN experiments on the DIBT 10k ranked prompts☆24Updated 11 months ago
- Aioli: A unified optimization framework for language model data mixing☆21Updated last month
- ☆20Updated last year
- LLM reads a paper and produce a working prototype☆48Updated last month
- LLMs as Collaboratively Edited Knowledge Bases☆44Updated last year
- ☆50Updated 3 months ago
- Synthetic data derived by templating, few shot prompting, transformations on public domain corpora, and monte carlo tree search.☆31Updated this week
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆163Updated 2 weeks ago