allenai/asta-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/allenai/asta-bench)

allenai / asta-bench

☆119

Alternatives and similar repositories for asta-bench

Users that are interested in asta-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

allenai / agent-baselines
View on GitHub
☆149Jun 8, 2026Updated last month
allenai / discoverybench
View on GitHub
Discovering Data-driven Hypotheses in the Wild
☆157Jun 9, 2025Updated last year
allenai / neurodiscoverybench
View on GitHub
☆22Jan 29, 2026Updated 5 months ago
Anikethh / ResearchGym
View on GitHub
Benchmark and execution environment for evaluating LLM agents on end-to-end AI Research. [ICLR 2026]
☆35May 31, 2026Updated last month
allenai / asta-paper-finder
View on GitHub
frozen-in-time version of our Paper Finder agent for reproducing evaluation results
☆245Mar 17, 2026Updated 4 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
reka-ai / research-eval
View on GitHub
A benchmark to evaluate search-augmented LLMs
☆17Aug 28, 2025Updated 10 months ago
InternScience / ResearchClawBench
View on GitHub
🦞 ResearchClawBench: Evaluating AI Agents for Automated Research from Re-Discovery to New-Discovery
☆221Updated this week
allenai / asta-plugins
View on GitHub
☆23Updated this week
OpenDFM / Xcientist
View on GitHub
The official repo for the paper "Externalizing Research Synthesis and Validation in AI Scientists through a Research Harness"
☆29Updated this week
allenai / prescience
View on GitHub
PreScience: A Benchmark for Forecasting Scientific Contributions
☆31May 4, 2026Updated 2 months ago
allenai / olmo-cookbook
View on GitHub
OLMost every training recipe you need to perform data interventions with the OLMo family of models.
☆72May 29, 2026Updated last month
facebookresearch / airs-bench
View on GitHub
AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents
☆104May 5, 2026Updated 2 months ago
CherYou / AutoResearchBench
View on GitHub
Official Repo: AutoResearchBench: Benchmarking AI Agents on Complex Scientific Literature Discovery
☆55Apr 24, 2026Updated 2 months ago
yansheng-qiu / AI_Idea_Bench_2025
View on GitHub
☆16May 15, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
artificial-scientist-lab / Impact4Cast
View on GitHub
Forecasting high-impact research topics via machine learning on evolving knowledge graphs
☆54Updated this week
open-compass / RePro
View on GitHub
[ICLR 2026] Rectifying LLM Thought From Lens of Optimization
☆15Dec 5, 2025Updated 7 months ago
facebookresearch / llm-speedrunner
View on GitHub
The Automated LLM Speedrunning Benchmark measures how well LLM agents can reproduce previous innovations and discover new ones in languag…
☆145May 6, 2026Updated 2 months ago
2toinf / IVM
View on GitHub
[NeurIPS-2024] The offical Implementation of "Instruction-Guided Visual Masking"
☆42Nov 15, 2024Updated last year
uq-project / UQ
View on GitHub
UQ: Assessing Language Models on Unsolved Questions
☆30Aug 26, 2025Updated 10 months ago
commoncrawl / cc-citations
View on GitHub
Scientific articles using or citing Common Crawl data
☆29Jul 8, 2026Updated last week
BatsResearch / ex2
View on GitHub
If CLIP Could Talk: Understanding Vision-Language Model Representations Through Their Preferred Concept Descriptions
☆17Apr 4, 2024Updated 2 years ago
hkust-nlp / deepsearch-tts
View on GitHub
Pushing Test-Time Scaling Limits of Deep Search with Asymmetric Verification
☆21Oct 8, 2025Updated 9 months ago
zoranmedic / mdcr
View on GitHub
Benchmark dataset for the evaluation of scientific article representations on the task of citation recommendation across various scientif…
☆12Oct 21, 2022Updated 3 years ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Open-Social-World / autolibra
View on GitHub
AutoLibra: Metric Induction for Agents from Open-Ended Human Feedback
☆19Apr 23, 2026Updated 2 months ago
OSU-NLP-Group / ScienceAgentBench
View on GitHub
[ICLR'25] ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
☆149Updated this week
xiaofengShi / SPAR
View on GitHub
☆26Jul 23, 2025Updated 11 months ago
facebookresearch / darling
View on GitHub
Official Implementation of the paper "Jointly Reinforcing Diversity and Quality in Language Model Generations"
☆61May 8, 2026Updated 2 months ago
allenai / fluid-benchmarking
View on GitHub
Fluid Language Model Benchmarking
☆29Sep 16, 2025Updated 10 months ago
allenai / IFBench
View on GitHub
☆160May 13, 2026Updated 2 months ago
google / spiqa
View on GitHub
Code release for "SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers" [NeurIPS D&B, 2024]
☆76Jan 13, 2025Updated last year
allenai / olmix
View on GitHub
☆41May 26, 2026Updated last month
mims-harvard / OptimusKG
View on GitHub
A modern multimodal knowledge graph with type-specific metadata across biomedical domains.
☆99Updated this week
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hltcoe / rank-k
View on GitHub
Repository for the listwise reranker Rank-K
☆16May 23, 2025Updated last year
openai / frontier-evals
View on GitHub
OpenAI Frontier Evals
☆1,261Apr 21, 2026Updated 3 months ago
allenai / ai2-scholarqa-lib
View on GitHub
Repo housing the open sourced code for the ai2 scholar qa app and also the corresponding library
☆279Jun 25, 2026Updated 3 weeks ago
allenai / figura11y
View on GitHub
AI Assistance for Writing Scientific Alt Text
☆14Feb 7, 2024Updated 2 years ago
NVlabs / ProfBench
View on GitHub
PhD/MBA-level human-annotated rubrics dataset across Physics, Chemistry, Finance and Consulting
☆32Oct 30, 2025Updated 8 months ago
foreverlasting1202 / QuestA
View on GitHub
☆22Jan 2, 2026Updated 6 months ago
JuChunHuang / protein-variants-generation
View on GitHub
Generating Protein Variants with Different Generative Models (HMM, VAE, ESM-2, ProtGPT2)
☆11Mar 14, 2024Updated 2 years ago