OSU-NLP-Group/ScienceAgentBench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OSU-NLP-Group/ScienceAgentBench)

OSU-NLP-Group / ScienceAgentBench

[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

☆149

Alternatives and similar repositories for ScienceAgentBench

Users that are interested in ScienceAgentBench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OSU-NLP-Group / awesome-agents4science
View on GitHub
A curated list of papers on LLMs and agents for scientific research and development
☆96Dec 11, 2024Updated last year
allenai / discoverybench
View on GitHub
Discovering Data-driven Hypotheses in the Wild
☆157Jun 9, 2025Updated last year
deadshot465 / novelcrafter-mcp
View on GitHub
An experimental desktop client for using Claude Desktop's MCP with Novelcrafter codices.
☆11Dec 3, 2024Updated last year
leynos / novelcrafter-prompts
View on GitHub
☆14Apr 26, 2025Updated last year
luojie1024 / MossQA-mnbvc
View on GitHub
本项目主要对开源的MOSS SFT数据进行整理，转换成mnbvc多轮对话格式。MOSS-003涵盖用性、忠实性、无害性三个层面，共353w样本，MOSS-003 包含更细粒度的有用性类别标记、更广泛的无害性数据和更长对话轮数，共630w样本，
☆13Dec 3, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
scosman / voicebox
View on GitHub
Exploration: using technology to aid people who lack both the ability to speak and fine motor control.
☆21Oct 24, 2024Updated last year
OSU-NLP-Group / AutoSDT
View on GitHub
[EMNLP'25] AutoSDT is a fully automatic pipeline to collect data-driven scientific coding tasks to train co-scientist models.
☆21Aug 11, 2025Updated 11 months ago
jonathantemplin / BayesianPsychometricModeling
View on GitHub
Course Materials for Bayesian Psychometric Modeling
☆14May 14, 2019Updated 7 years ago
language-agent-tutorial / language-agent-tutorial.github.io
View on GitHub
[EMNLP 2024 Tutorial] Language Agents: Foundations, Prospects, and Risks
☆10Nov 27, 2024Updated last year
magesh-technovator / awesome-ai-applications
View on GitHub
A Comprehensive survey on business use cases of AI that help them thrive in the digital economy
☆13Oct 7, 2020Updated 5 years ago
zoejane / zmusic-pal
View on GitHub
https://zmusic-pal.zoejane.net. A lightweight web application for quick key and chord lookup, featuring an AI companion for deeper musi…
☆12Mar 1, 2025Updated last year
princeton-pli / hal-harness
View on GitHub
☆308Jul 1, 2026Updated 2 weeks ago
behavioral-data / BLADE
View on GitHub
[EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science
☆35Oct 25, 2024Updated last year
siegelz / core-bench
View on GitHub
☆77Nov 23, 2025Updated 7 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
allenai / discoveryworld
View on GitHub
A virtual environment for developing and evaluating automated scientific discovery agents.
☆218Mar 10, 2025Updated last year
ouyangzhiping / feppy
View on GitHub
The free energy principle
☆20Feb 16, 2025Updated last year
steven-tey / awesome-url-shortener
View on GitHub
🔗 A curated list of awesome url shortener
☆23Jan 22, 2024Updated 2 years ago
BochaAI / open-webui-Bocha
View on GitHub
By leveraging Bocha AI Search API , your AI applications can now access high-quality, up-to-date knowledge from billions of web pages and…
☆21Feb 9, 2025Updated last year
scicode-bench / SciCode
View on GitHub
A benchmark that challenges language models to code solutions for scientific problems
☆213Updated this week
ropensci-archive / alm
View on GitHub
ARCHIVED R Client for the Lagotto Altmetrics Platform
☆15May 10, 2022Updated 4 years ago
OSU-NLP-Group / SeeActChromeExtension
View on GitHub
☆18Jan 3, 2025Updated last year
OSU-NLP-Group / Mind2Web-2
View on GitHub
[NeurIPS'25 D&B] Mind2Web-2 Benchmark: Evaluating Agentic Search with Agent-as-a-Judge
☆111May 17, 2026Updated 2 months ago
cddesja / hemp
View on GitHub
Datasets and functions for the Handbook of Educational Measurement and Psychometrics using R.
☆24Apr 2, 2021Updated 5 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
ipruning / run-your-py-on-serverless-gpu
View on GitHub
快速启动 GPU 实验
☆14Apr 1, 2025Updated last year
yyxxrr739 / autosar-rag
View on GitHub
This is a AUTOSAR documents specific retriever based on LLM and RAG.
☆16Nov 12, 2024Updated last year
OSU-NLP-Group / llm-planning-eval
View on GitHub
[ACL'24] Code and data of paper "When is Tree Search Useful for LLM Planning? It Depends on the Discriminator"
☆54Feb 23, 2024Updated 2 years ago
microsoft / text-to-sql-schema-expansion-generalization
View on GitHub
Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion
☆13Jul 26, 2023Updated 2 years ago
tliutony / causal-data-science-perspective
View on GitHub
☆22Jun 17, 2024Updated 2 years ago
Future-House / LAB-Bench
View on GitHub
Evaluation dataset for AI systems intended to benchmark capabilities foundational to scientific research in biology
☆120Sep 27, 2025Updated 9 months ago
WeblateOrg / hello
View on GitHub
Hello world demonstration for Weblate
☆15Jan 20, 2026Updated 6 months ago
usail-hkust / RDAT
View on GitHub
☆12Aug 5, 2023Updated 2 years ago
OSU-NLP-Group / WebDreamer
View on GitHub
[TMLR'25] "Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents"
☆104Oct 5, 2025Updated 9 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
vale-cli / SubVale
View on GitHub
A Sublime Text 3 client for Vale Server.
☆13Dec 7, 2020Updated 5 years ago
OSU-NLP-Group / SeeAct
View on GitHub
[ICML'24] SeeAct is a system for generalist web agents that autonomously carry out tasks on any given website, with a focus on large mult…
☆851Feb 3, 2025Updated last year
shehuiwojiege / open-llms-next-web
View on GitHub
open-llms-next-web，一个类似于chatgpt-next-web的开源大型语言模型web演示，支持离线开源大模型和PEFT模型
☆18May 13, 2024Updated 2 years ago
lamm-mit / SciAgentsDiscovery
View on GitHub
☆627May 10, 2025Updated last year
gaurav-nelson / github-action-vale-lint
View on GitHub
⛔️ DEPRECATED ~~ GitHub action lint with Vale ✅❎ ~~ DEPRECATED ⛔️
☆12Apr 14, 2020Updated 6 years ago
esnme / landscape
View on GitHub
A Stylus-powered frontend CSS toolkit for building rich and beautiful web apps.
☆16Apr 2, 2012Updated 14 years ago
textbundle / textbundle.org
View on GitHub
GitHub page for the TextBundle Markdown/text specification
☆24Jul 30, 2014Updated 11 years ago