allenai / discoveryworld
A virtual environment for developing and evaluating automated scientific discovery agents.
☆117Updated last week
Alternatives and similar repositories for discoveryworld:
Users that are interested in discoveryworld are comparing it to the libraries listed below
- Repository for the paper Stream of Search: Learning to Search in Language☆119Updated 5 months ago
- ☆76Updated 6 months ago
- Dataset and benchmark for assessing LLMs in translating natural language descriptions of planning problems into PDDL☆48Updated 3 months ago
- Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)☆48Updated 5 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆97Updated this week
- ☆140Updated 8 months ago
- Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.☆154Updated 2 months ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆234Updated 3 months ago
- Discovering Data-driven Hypotheses in the Wild☆51Updated last month
- Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]☆107Updated last month
- 🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…☆134Updated last month
- ☆51Updated last week
- ☆89Updated this week
- Can Language Models Solve Olympiad Programming?☆108Updated this week
- Replicating O1 inference-time scaling laws☆70Updated last month
- Benchmarking Agentic LLM and VLM Reasoning On Games☆88Updated last week
- ☆97Updated 3 weeks ago
- Governance of the Commons Simulation (GovSim)☆31Updated 6 months ago
- Code for the paper "VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment"☆111Updated 2 months ago
- 🌾 OAT: Online AlignmenT for LLMs☆81Updated 3 weeks ago
- Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation☆38Updated last year
- ☆115Updated 3 months ago
- A suite of open-ended, non-imitative tasks involving generalizable skills for large language model chatbots and agents to enable bootstra…☆31Updated last month
- Bootstrapping ARC☆90Updated last month
- OpenCoconut implements a latent reasoning paradigm where we generate thoughts before decoding.☆154Updated this week
- Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'☆175Updated last month
- A curated collection of LLM reasoning and planning resources, including key papers, limitations, benchmarks, and additional learning mate…☆211Updated 4 months ago
- Latent Program Network (from the "Searching Latent Program Spaces" paper)☆42Updated last month
- ☆115Updated this week
- Learning Universal Predictors☆72Updated 5 months ago