Discovering Data-driven Hypotheses in the Wild
☆131Jun 9, 2025Updated 8 months ago
Alternatives and similar repositories for discoverybench
Users that are interested in discoverybench are comparing it to the libraries listed below
Sorting:
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆124Aug 26, 2025Updated 6 months ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆34Oct 25, 2024Updated last year
- A virtual environment for developing and evaluating automated scientific discovery agents.☆200Mar 10, 2025Updated 11 months ago
- Dataset and annotations for ASSETS 2022 publication☆12Oct 6, 2022Updated 3 years ago
- BioDiscoveryAgent is an LLM-based AI agent for closed-loop design of genetic perturbation experiments☆97Jul 6, 2025Updated 7 months ago
- [ACL 2024] <Large Language Models for Automated Open-domain Scientific Hypotheses Discovery>. It has also received the best poster award …☆42Oct 28, 2024Updated last year
- ☆16Jan 29, 2026Updated last month
- Repository containing dataset, models and code associated with the CHIME project☆17Aug 22, 2024Updated last year
- Automated Hypothesis Testing with Agentic Sequential Falsifications☆246May 14, 2025Updated 9 months ago
- Meta-Analysis for JAMOVI☆11Nov 11, 2017Updated 8 years ago
- ScienceWorld is a text-based virtual environment centered around accomplishing tasks from the standardized elementary science curriculum.☆337Dec 3, 2025Updated 3 months ago
- Code for Estimating Multi-cause Treatment Effects via Single-cause Perturbation (NeurIPS 2021)☆14Jan 5, 2022Updated 4 years ago
- A Tool to Estimate the Time Needed to Conduct a Systematic Review or Systematic Map☆16Jul 29, 2022Updated 3 years ago
- Reproducible and flexible LLM evaluations for scientific reasoning.☆26Jul 23, 2025Updated 7 months ago
- A benchmark that challenges language models to code solutions for scientific problems☆176Updated this week
- Example workflow for our data-centric speech benchmark☆17Jul 6, 2023Updated 2 years ago
- ☆49Apr 4, 2025Updated 11 months ago
- Protein prediction models implemented with Modal☆30Feb 22, 2026Updated last week
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Apr 4, 2024Updated last year
- This is the repository for paper EscapeBench: Pushing Language Models to Think Outside the Box☆18Dec 19, 2024Updated last year
- The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other opt…☆30Updated this week
- Benchmarking LLMs with Challenging Tasks from Real Users☆246Nov 3, 2024Updated last year
- Code/data for MARG (multi-agent review generation)☆59Sep 30, 2025Updated 5 months ago
- Benchmark for LLM-based Agents in Computational Biology☆72Oct 6, 2025Updated 4 months ago
- Official code for the paper "Contrastive Representations for Temporal Reasoning".☆52Nov 25, 2025Updated 3 months ago
- ☆29Mar 22, 2024Updated last year
- SmartPlay is a benchmark for Large Language Models (LLMs). Uses a variety of games to test various important LLM capabilities as agents. …☆146Apr 11, 2024Updated last year
- ☆229Updated this week
- ☆133Oct 16, 2025Updated 4 months ago
- ☆28Nov 10, 2025Updated 3 months ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆155Sep 9, 2025Updated 5 months ago
- ☆330Jun 19, 2024Updated last year
- ☆27Jun 5, 2025Updated 8 months ago
- This is an official implementation for "Block Selection Method for Using Feature Norm in Out-of-distribution Detection", CVPR 2023.☆24May 21, 2024Updated last year
- [COLM 2025] EvalTree: Profiling Language Model Weaknesses via Hierarchical Capability Trees☆31Jul 11, 2025Updated 7 months ago
- ☆32Feb 11, 2025Updated last year
- [AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research☆30Aug 6, 2024Updated last year
- ☆29Oct 24, 2025Updated 4 months ago
- A Large-Scale, High-Quality Math Dataset for Reinforcement Learning in Language Models☆72Feb 25, 2025Updated last year