google-research-datasets/GSM-IC

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/google-research-datasets/GSM-IC)

google-research-datasets / GSM-IC

Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant sentences in problem descriptions. GSM-IC is constructed to evaluate the distractibility of language models.

☆67

Alternatives and similar repositories for GSM-IC

Users that are interested in GSM-IC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

qtli / GSM-Plus
View on GitHub
GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.
☆66Jul 8, 2024Updated 2 years ago
sunlab-osu / Understanding-CoT
View on GitHub
☆88Jun 1, 2023Updated 3 years ago
Liyan06 / ChartMuseum
View on GitHub
[NeurIPS 2025] ChartMuseum: Testing Visual Reasoning Capabilities of Large Vision-Language Models
☆24Apr 20, 2026Updated 3 months ago
iiis-ai / IterativeQuestionComposing
View on GitHub
[AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)
☆23Oct 2, 2025Updated 9 months ago
harsh19 / Reasoning-Chains-MultihopQA
View on GitHub
Code and Data for our EMNLP 2020 paper titled 'Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multiho…
☆28Feb 9, 2022Updated 4 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
DataArcTech / ChartBench
View on GitHub
☆16May 15, 2025Updated last year
liuchengwucn / FIMO
View on GitHub
☆38Jun 30, 2026Updated 3 weeks ago
chaochun / nlu-asdiv-dataset
View on GitHub
☆52Jul 4, 2023Updated 3 years ago
THUNLP-MT / PGRA
View on GitHub
Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks
☆12Sep 1, 2023Updated 2 years ago
UCSB-NLP-Chang / SelfDenoise
View on GitHub
☆14May 7, 2024Updated 2 years ago
ahmetustun / udapter
View on GitHub
UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…
☆31Dec 5, 2022Updated 3 years ago
oriyor / ret-robust
View on GitHub
Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"
☆77Aug 6, 2024Updated last year
mlfoundations / clip_quality_not_quantity
View on GitHub
☆28Oct 18, 2022Updated 3 years ago
OpenMOSS / Say-I-Dont-Know
View on GitHub
[ICML'2024] Can AI Assistants Know What They Don't Know?
☆86Feb 5, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
ag1988 / injecting_numeracy
View on GitHub
The accompanying code for "Injecting Numerical Reasoning Skills into Language Models" (Mor Geva*, Ankit Gupta* and Jonathan Berant, ACL 2…
☆90Aug 20, 2024Updated last year
kyegomez / EAOT
View on GitHub
The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"
☆19Mar 11, 2024Updated 2 years ago
jderiu / spot-the-bot-code
View on GitHub
☆13Mar 1, 2022Updated 4 years ago
veronica320 / Faithful-COT
View on GitHub
Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".
☆169May 7, 2024Updated 2 years ago
asaparov / prontoqa
View on GitHub
Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
☆165Sep 9, 2025Updated 10 months ago
bryanchrist / llama2-70b
View on GitHub
Codebase for fine-tuning Llama2 70B to generate math test questions and answers.
☆11Aug 30, 2024Updated last year
xiye17 / EvalQAExpl
View on GitHub
Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.
☆17Apr 25, 2021Updated 5 years ago
Nanami18 / Snowballed_Hallucination
View on GitHub
☆43Sep 3, 2024Updated last year
cs-holder / Reasoning-Self-Evolution-Survey
View on GitHub
☆54Mar 6, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
LAMDA-NeSy / Self-Backtracking
View on GitHub
☆52Feb 12, 2025Updated last year
AkariAsai / unanswerable_qa
View on GitHub
The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".
☆28Jun 19, 2021Updated 5 years ago
dashends / CodeSyntax
View on GitHub
Code and dataset for EMNLP 2022 Findings paper "Benchmarking Language Models for Code Syntax Understanding"
☆16Oct 24, 2022Updated 3 years ago
icip-cas / SSO
View on GitHub
A scalable automated alignment method for large language models. Resources for "Aligning Large Language Models via Self-Steering Optimiza…
☆20Nov 21, 2024Updated last year
MJ-Jang / BECEL
View on GitHub
☆10Jan 28, 2024Updated 2 years ago
AkariAsai / evidentiality_qa
View on GitHub
The official implemetation of "Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks" (NAACL 2022).
☆44Dec 25, 2022Updated 3 years ago
marcusm117 / IdentityChain
View on GitHub
[ICLR 2024] Beyond Accuracy: Evaluating Self-Consistency of Code Large Language Models with IdentityChain
☆11Nov 24, 2025Updated 7 months ago
swiseman / neighbor-tagging
View on GitHub
☆16Oct 24, 2021Updated 4 years ago
sauc-abadal / ALT
View on GitHub
Official repository for ALT (ALignment with Textual feedback).
☆10Jul 25, 2024Updated last year
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
allenai / DecomP
View on GitHub
Repository for Decomposed Prompting
☆100Nov 15, 2023Updated 2 years ago
arkilpatel / SVAMP
View on GitHub
NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?
☆142Jun 30, 2022Updated 4 years ago
INK-USC / rockner
View on GitHub
☆11Oct 3, 2021Updated 4 years ago
dayeonki / mt_feedback
View on GitHub
Code for "Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations" [NAACL Findings 2024]
☆14Apr 3, 2026Updated 3 months ago
VerdureChen / SOS-Retrieval-Loop
View on GitHub
Codebase of ACL2024 paper "Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Ques…
☆16Jun 4, 2024Updated 2 years ago
lilt / tec
View on GitHub
Evaluation code and data for "Automatic Correction of Human Translations" [NAACL 2022].
☆19Dec 9, 2022Updated 3 years ago
alibaba / ChatLearn
View on GitHub
A flexible and efficient training framework for large-scale alignment tasks
☆452Oct 23, 2025Updated 8 months ago