Discovering Data-driven Hypotheses in the Wild
☆136Jun 9, 2025Updated 9 months ago
Alternatives and similar repositories for discoverybench
Users that are interested in discoverybench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery☆132Mar 5, 2026Updated 2 weeks ago
- [EMNLP 2024 Findings] Benchmarking Language Model Agents for Data-Driven Science☆35Oct 25, 2024Updated last year
- BioDiscoveryAgent is an LLM-based AI agent for closed-loop design of genetic perturbation experiments☆99Jul 6, 2025Updated 8 months ago
- Dataset and annotations for ASSETS 2022 publication☆12Oct 6, 2022Updated 3 years ago
- ☆10Nov 6, 2024Updated last year
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Automated Hypothesis Testing with Agentic Sequential Falsifications☆251May 14, 2025Updated 10 months ago
- Code release for "CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning", ICLR 2025☆29Apr 21, 2025Updated 11 months ago
- A curated list of papers on LLMs and agents for scientific research and development☆86Dec 11, 2024Updated last year
- Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification☆11Nov 15, 2023Updated 2 years ago
- ☆119Updated this week
- Reproducible and flexible LLM evaluations for scientific reasoning.☆26Jul 23, 2025Updated 8 months ago
- If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions☆17Apr 4, 2024Updated last year
- Pytorch implementation of DeepNovoV2, a state-of-the-art de novo peptide sequencing model.☆27May 21, 2019Updated 6 years ago
- S2ORC: The Semantic Scholar Open Research Corpus: https://www.aclweb.org/anthology/2020.acl-main.447/☆1,022Apr 26, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Benchmark for LLM-based Agents in Computational Biology☆83Oct 6, 2025Updated 5 months ago
- Headway - Selenium Maven TestNG POM Data Driven Framework☆18Jul 2, 2025Updated 8 months ago
- AIRA-dojo: a framework for developing and evaluating AI research agents☆135Mar 12, 2026Updated last week
- ☆28Jun 5, 2025Updated 9 months ago
- Reasoning by Communicating with Agents☆29Apr 29, 2025Updated 10 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆46Feb 18, 2025Updated last year
- ☆64Apr 25, 2020Updated 5 years ago
- [NeurIPS 2024] OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI☆107Mar 6, 2025Updated last year
- Learning to route instances for Human vs AI Feedback (ACL Main '25)☆27Jul 23, 2025Updated 8 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Web-Scarping tool for downloading the content of the following publishers: Elsevier, RSC, Web of Science, Springer Nature , Wiley.☆37Jun 24, 2025Updated 9 months ago
- Code/data for MARG (multi-agent review generation)☆63Mar 5, 2026Updated 2 weeks ago
- ☆12Feb 11, 2026Updated last month
- ☆17Jan 29, 2026Updated last month
- Simple and scalable tools for data-driven pretraining data selection.☆29Jun 9, 2025Updated 9 months ago
- Benchmarking LLMs with Challenging Tasks from Real Users☆247Nov 3, 2024Updated last year
- ☆33Feb 11, 2025Updated last year
- Data and code for the paper "CiteWorth: Cite-Worthiness Detection for Improved Scientific Document Understanding"☆14Sep 8, 2022Updated 3 years ago
- The Platform for Self-Improving Code. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other opt…☆36Updated this week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for Estimating Multi-cause Treatment Effects via Single-cause Perturbation (NeurIPS 2021)☆14Jan 5, 2022Updated 4 years ago
- ☆38Oct 24, 2024Updated last year
- Example workflow for our data-centric speech benchmark☆17Jul 6, 2023Updated 2 years ago
- ☆247Updated this week
- ☆88Dec 15, 2023Updated 2 years ago
- Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.☆159Sep 9, 2025Updated 6 months ago
- AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.☆1,178Feb 12, 2026Updated last month