mandyyyyii/scibench

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mandyyyyii/scibench)

mandyyyyii / scibench

☆132

Alternatives and similar repositories for scibench

Users that are interested in scibench are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

wenhuchen / TheoremQA
View on GitHub
The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset
☆161Apr 23, 2024Updated 2 years ago
OpenDFM / SciEval
View on GitHub
[AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research
☆31Aug 6, 2024Updated last year
lqtrung1998 / mwp_cot_design
View on GitHub
☆14Oct 11, 2023Updated 2 years ago
lupantech / PromptPG
View on GitHub
Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".
☆165Dec 27, 2023Updated 2 years ago
TIGER-AI-Lab / TheoremQA
View on GitHub
The official repo for "TheoremQA: A Theorem-driven Question Answering dataset" (EMNLP 2023)
☆40May 15, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
openai / prm800k
View on GitHub
800,000 step-level correctness labels on LLM solutions to MATH problems
☆2,150Jun 1, 2023Updated 3 years ago
mandyyyyii / east
View on GitHub
☆19Aug 4, 2025Updated 11 months ago
OpenBMB / OlympiadBench
View on GitHub
[ACL 2024]Official GitHub repo for OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scie…
☆195Jun 8, 2025Updated last year
lupantech / dl4math
View on GitHub
Resources of deep learning for mathematical reasoning (DL4MATH).
☆375Dec 22, 2023Updated 2 years ago
OFA-Sys / gsm8k-ScRel
View on GitHub
Codes and Data for Scaling Relationship on Learning Mathematical Reasoning with Large Language Models
☆268Sep 12, 2024Updated last year
protagolabs / odyssey-math
View on GitHub
☆84Jan 25, 2025Updated last year
oriyor / assistantbench
View on GitHub
Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"
☆71Dec 9, 2024Updated last year
lupantech / MathVista
View on GitHub
MathVista: data, code, and evaluation for Mathematical Reasoning in Visual Contexts
☆367Sep 29, 2025Updated 9 months ago
ZijieH / LG-ODE
View on GitHub
☆33May 30, 2022Updated 4 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
OSU-NLP-Group / ChemToolAgent
View on GitHub
Official code repo for the paper "ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving" (previously "Tooli…
☆19Jun 7, 2025Updated last year
rookie-joe / PDA
View on GitHub
☆36Jan 10, 2025Updated last year
allenai / Lila
View on GitHub
A unified benchmark for math reasoning
☆90Jan 25, 2023Updated 3 years ago
albertqjiang / MMA
View on GitHub
The official repository for the paper Multilingual Mathematical Autoformalization
☆39May 20, 2024Updated 2 years ago
salesforce / dialog-flow-extraction
View on GitHub
☆15Jun 2, 2026Updated last month
jtonglet / Numerical-Hybrid-QA-Literature
View on GitHub
A list of Numerical Multimodal reasoning papers and their implementation
☆11May 13, 2024Updated 2 years ago
sairin1202 / SciXGen
View on GitHub
Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"
☆13Feb 14, 2022Updated 4 years ago
ZijieH / CG-ODE
View on GitHub
☆38Sep 22, 2021Updated 4 years ago
genrm-star / genrm-critiques
View on GitHub
GenRM-CoT: Data release for verification rationales
☆68Oct 16, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Jiachen-T-Wang / GREATS
View on GitHub
☆20Jun 27, 2026Updated 3 weeks ago
lupantech / ScienceQA
View on GitHub
Data and code for NeurIPS 2022 Paper "Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering".
☆737Sep 19, 2024Updated last year
wellecks / naturalprover
View on GitHub
NaturalProver: Grounded Mathematical Proof Generation with Language Models
☆40Mar 24, 2023Updated 3 years ago
yichousun / Winter2021_CS249_GNN
View on GitHub
☆54Dec 20, 2022Updated 3 years ago
zhangir-azerbayev / MetaMath
View on GitHub
☆11Oct 11, 2023Updated 2 years ago
whyNLP / Conic10K
View on GitHub
Conic10K: A large-scale dataset for closed-vocabulary math problem understanding. Accepted to EMNLP2023 Findings.
☆33Dec 6, 2023Updated 2 years ago
FranxYao / GPT-Bargaining
View on GitHub
Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback
☆206May 24, 2023Updated 3 years ago
SXKDZ / homework-template
View on GitHub
LaTeX template for homework
☆13May 19, 2020Updated 6 years ago
TIGER-AI-Lab / Program-of-Thoughts
View on GitHub
Data and Code for Program of Thoughts [TMLR 2023]
☆317May 15, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
WHGTyen / BIG-Bench-Mistake
View on GitHub
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆89Aug 10, 2024Updated last year
THUDM / SciGLM
View on GitHub
SciGLM: Training Scientific Language Models with Self-Reflective Instruction Annotation and Tuning (NeurIPS D&B Track 2024)
☆88Feb 25, 2024Updated 2 years ago
ahxt / G2R
View on GitHub
[WWW2022] Geometric Graph Representation Learning via Maximizing Rate Reduction
☆26May 27, 2022Updated 4 years ago
LLaMafia / SFT_function_learning
View on GitHub
Explore what LLMs are really leanring over SFT
☆28Mar 30, 2024Updated 2 years ago
gzcch / Bingo
View on GitHub
☆55Apr 1, 2024Updated 2 years ago
lupantech / IconQA
View on GitHub
Data and code for NeurIPS 2021 Paper "IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language Reasoning".
☆55Jan 28, 2024Updated 2 years ago
facebookresearch / worldsense
View on GitHub
WorldSense benchmark for grounded reasoning in language models
☆25Nov 28, 2023Updated 2 years ago