allenai / discoveryworldLinks

A virtual environment for developing and evaluating automated scientific discovery agents.

☆168

Alternatives and similar repositories for discoveryworld

Users that are interested in discoveryworld are comparing it to the libraries listed below

Sorting:

OSU-NLP-Group / ScienceAgentBench
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
☆94Updated 2 months ago
allenai / discoverybench
Discovering Data-driven Hypotheses in the Wild
☆104Updated 2 months ago
LeonGuertler / TextArena
A Collection of Competitive Text-Based Games for Language Model Evaluation and Reinforcement Learning
☆238Updated this week
kanishkg / stream-of-search
Repository for the paper Stream of Search: Learning to Search in Language
☆150Updated 6 months ago
scicode-bench / SciCode
A benchmark that challenges language models to code solutions for scientific problems
☆127Updated this week
ZonglinY / MOOSE
[ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …
☆42Updated 9 months ago
abdulhaim / LMRL-Gym
☆99Updated last year
SALT-NLP / collaborative-gym
Framework and toolkits for building and evaluating collaborative agents that can work together with humans.
☆91Updated 4 months ago
allenai / ScienceWorld
ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.
☆283Updated 3 weeks ago
balrog-ai / BALROG
Benchmarking Agentic LLM and VLM Reasoning On Games
☆179Updated 3 weeks ago
agentification / RAFA_code
☆143Updated last year
Yu-Fangxu / FoR
[ICML 2025] Flow of Reasoning: Training LLMs for Divergent Reasoning with Minimal Examples
☆104Updated 2 weeks ago
Berkeley-NLP / Agent-Eval-Refine
Code for Paper: Autonomous Evaluation and Refinement of Digital Agents [COLM 2024]
☆140Updated 8 months ago
DeLLMa / DeLLMa
Official Implementation of "DeLLMa: Decision Making Under Uncertainty with Large Language Models"
☆61Updated 9 months ago
microsoft / stop
Self-Taught Optimizer (STOP): Recursively Self-Improving Code Generation
☆44Updated last year
goodfire-ai / r1-interpretability
Open source interpretability artefacts for R1.
☆157Updated 3 months ago
vsubramaniam851 / multiagent-ft
☆213Updated 5 months ago
Ber666 / RAP
Reasoning with Language Model is Planning with World Model
☆168Updated last year
google-deepmind / questbench
☆25Updated 2 months ago
ChicagoHAI / hypothesis-generation
This is the official repository for HypoGeniC (Hypothesis Generation in Context) and HypoRefine, which are automated, data-driven tools t…
☆77Updated last week
ekinakyurek / marc
Public repository for "The Surprising Effectiveness of Test-Time Training for Abstract Reasoning"
☆322Updated 8 months ago
ScalingIntelligence / Archon
Archon provides a modular framework for combining different inference-time techniques and LMs with just a JSON config file.
☆176Updated 5 months ago
StonyBrookNLP / appworld
🌍 Repository for "AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agent", ACL'24 Best Resource Pap…
☆234Updated this week
snap-stanford / MLAgentBench
☆300Updated last year
eth-sri / matharena
Evaluation of LLMs on latest math competitions
☆155Updated this week
SalesforceAIResearch / LaTRO
☆119Updated 5 months ago
kohjingyu / search-agents
Code for the paper 🌳 Tree Search for Language Model Agents
☆208Updated last year
jonathanmli / Avalon-LLM
This repository contains a LLM benchmark for the social deduction game `Resistance Avalon'
☆121Updated 2 months ago
giorgiopiatti / GovSim
Governance of the Commons Simulation (GovSim)
☆56Updated 6 months ago
OSU-NLP-Group / GrokkedTransformer
Code for NeurIPS'24 paper 'Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization'
☆226Updated 3 weeks ago