facebookresearch/airs-bench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/facebookresearch/airs-bench)

facebookresearch / airs-bench

AIRS-Bench: an AI Research Science benchmark for quantifying the end-to-end AI research abilities of LLM agents

☆100

Alternatives and similar repositories for airs-bench

Users that are interested in airs-bench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

NJUNLP / Hallu-PI
View on GitHub
The code and datasets of our ACM MM 2024 paper "Hallu-PI: Evaluating Hallucination in Multi-modal Large Language Models within Perturbed …
☆11Sep 27, 2024Updated last year
wildminder / ComfyUI-KaniTTS
View on GitHub
ComfyUI node for modular, human‑like Kani TTS. Generate natural, high‑quality speech from text
☆38Oct 17, 2025Updated 8 months ago
itmo-ai / YSC-2023-Papers
View on GitHub
YSC 2023 Papers: A complete collection of research papers, code and data from the International Young Scientists Conference 2023 for youn…
☆12Jan 17, 2024Updated 2 years ago
paulpogoda / OSINT-Tools-Kyrgyzstan
View on GitHub
OSINT for Kyrgyz Republic
☆17Apr 13, 2025Updated last year
MLE-Dojo / MLE-Dojo
View on GitHub
☆98Oct 30, 2025Updated 8 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
georgevetticaden / 3-amigo-agents
View on GitHub
☆23Jul 10, 2025Updated 11 months ago
ITMO-NSS-team / composite-flood-forecast
View on GitHub
Code for paper "Short-term River Flood Forecasting using Composite Models and Automated Machine Learning: the Case Study of Lena River"
☆12Dec 9, 2021Updated 4 years ago
govtech-responsibleai / KnowOrNot
View on GitHub
☆28Feb 11, 2026Updated 4 months ago
FareedKhan-dev / 14-rag-failures
View on GitHub
Encountering 14 different Naive RAG fails and using KG to solve it
☆27Dec 4, 2025Updated 7 months ago
RUCKBReasoning / DPO_Text2SQL
View on GitHub
[ACL 2025] Uncovering the Impact of Chain-of-Thought Reasoning for Direct Preference Optimization: Lessons from Text-to-SQL
☆16Oct 9, 2025Updated 9 months ago
linhaowei1 / SLD
View on GitHub
[ICLR26] AI-based scaling law discovery
☆31Jan 30, 2026Updated 5 months ago
jrzkaminski / ITMO-beamer
View on GitHub
This is an unofficial ITMO beamer template made by me. Please, feel free to use it and contribute.
☆15Oct 10, 2023Updated 2 years ago
pavviaz / DeepScriptum
View on GitHub
Convert any PDF into it's LaTeX source
☆18May 15, 2025Updated last year
andreygetmanov / science_art_at_least_once_a_week
View on GitHub
Source code for https://t.me/science_art_at_least_once_a_week channel
☆16Jun 15, 2024Updated 2 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
aaronng91 / semantic-turn-detection
View on GitHub
Script to demonstrate how to use a Language Model for Semantic Turn Detection. Refer to blog post for full details.
☆18May 9, 2025Updated last year
distil-labs / distil-example-text2sql-with-claude
View on GitHub
Example repo showcasing model training and deployment with distil claude cli skill
☆54Jan 19, 2026Updated 5 months ago
wdimmy / Var2Vec
View on GitHub
The code is for our AAAI2023 paper: Efficient Embeddings of Logical Variables for Query Answering over Incomplete Knowledge Graphs (Ding…
☆10Dec 17, 2022Updated 3 years ago
cognizant-ai-lab / neuro-san-benchmarking
View on GitHub
General benchmarking apparatus for running multi-agent systems against benchmarks
☆46Apr 13, 2026Updated 2 months ago
lupantech / Eubiota
View on GitHub
☆57Mar 3, 2026Updated 4 months ago
EricPerbos / GTX-vs-RTX-Deep-Learning-benchmarks
View on GitHub
☆17Mar 12, 2019Updated 7 years ago
HKUST-KnowComp / WFRE
View on GitHub
Wasserstein-Fisher-Rao Embedding: Logical Query Embeddings with Local Comparison and Global Transport (Findings-ACL 2023)
☆13May 4, 2023Updated 3 years ago
JakobEliasWagner / NeuralOperators
View on GitHub
Neural Operators with Applications to the Helmholtz Equation
☆11Sep 19, 2024Updated last year
alan-turing-institute / t0-1
View on GitHub
Application of Retrieval-Augmented Reasoning on a domain-specific body of knowledge
☆35Feb 27, 2026Updated 4 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
OpenBrowserAI / openbrowser
View on GitHub
OpenBrowser is an open-source, AI-native browser built on Chromium — a truly privacy-first alternative to ChatGPT Atlas, Perplexity Comet…
☆56Feb 24, 2026Updated 4 months ago
Tianshi-Xu / Life-Harness
View on GitHub
Offical implementation of "Life-Harness"
☆210Jun 2, 2026Updated last month
megagonlabs / blue
View on GitHub
Blue is an open-source framework for building enterprise-ready agentic workflows through compound AI system architecture. Blue uses strea…
☆21Apr 6, 2026Updated 3 months ago
inworld-ai / prompt-brewery
View on GitHub
Prompt Brewery
☆54Aug 8, 2025Updated 11 months ago
kubernetes-up-and-running / helloworld
View on GitHub
Helloworld example application.
☆12Jan 24, 2016Updated 10 years ago
commondataio / awesome-opendata-software
View on GitHub
Awesome list of the software tools related to opendata: data catalogs, ingestion tools, data prep tools and so on
☆37Oct 28, 2025Updated 8 months ago
svatasoiu / algorithmic-trader
View on GitHub
My own implementation of an algorithmic trader in OCaML
☆12Aug 3, 2014Updated 11 years ago
ITMO-CODE-AI / GaMAC
View on GitHub
☆115Dec 7, 2025Updated 7 months ago
paulpogoda / OSINT-Tools-Kazakhstan
View on GitHub
I have compiled a list of OSINT tools that may be useful to you when conducting investigations related to Kazakhstan. Do you want me to a…
☆35Jan 17, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
RoughStochVol / regularity_structure_finance
View on GitHub
Bayer, Friz, Gassiat, Martin, Stemper (2017). A regularity structure for finance.
☆12Sep 29, 2017Updated 8 years ago
liminxian / DukeMTMC-SI-Tracklet
View on GitHub
Unsupervised Tracklet Person Re-Identification
☆10Apr 29, 2019Updated 7 years ago
plastic-labs / openclaw-honcho
View on GitHub
Make your OpenClaw happy, give it Honcho
☆79May 21, 2026Updated last month
allenai / super-benchmark
View on GitHub
☆53Apr 4, 2025Updated last year
yihong-chen / ReFactorGNN
View on GitHub
Implementation for ReFactor GNNs
☆15Jun 10, 2025Updated last year
aimclub / ai-competency-model
View on GitHub
Модель профессиональных компетенций в области ИИ
☆31Jan 15, 2025Updated last year
martinxu9 / claude-investor
View on GitHub
☆32Mar 28, 2024Updated 2 years ago