Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant sentences in problem descriptions. GSM-IC is constructed to evaluate the distractibility of language models.
☆65Feb 13, 2023Updated 3 years ago
Alternatives and similar repositories for GSM-IC
Users that are interested in GSM-IC are comparing it to the libraries listed below
Sorting:
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- Code and Data for our EMNLP 2020 paper titled 'Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multiho…☆28Feb 9, 2022Updated 4 years ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 5 months ago
- ☆88Jun 1, 2023Updated 2 years ago
- ☆10Feb 6, 2025Updated last year
- ☆52Jul 4, 2023Updated 2 years ago
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- ☆29Oct 18, 2022Updated 3 years ago
- ☆13Mar 1, 2022Updated 4 years ago
- ☆14May 7, 2024Updated last year
- ☆11Oct 3, 2021Updated 4 years ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆85Feb 5, 2024Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".☆28Jun 19, 2021Updated 4 years ago
- UDapter is a multilingual dependency parser that uses "contextual" adapters together with language-typology features for language-specifi…☆31Dec 5, 2022Updated 3 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- ☆15May 15, 2025Updated 9 months ago
- ☆30Dec 27, 2024Updated last year
- Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation☆31May 8, 2023Updated 2 years ago
- (NAACL 2024) Guiding Large Language Models to Post-Edit Machine Translation with Error Annotations☆15Apr 14, 2025Updated 10 months ago
- Pytorch version of the CVPR2014 paper: "Deep CNN-Based Blind Image Quality Predictor."☆13Aug 18, 2021Updated 4 years ago
- Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".☆15Oct 14, 2022Updated 3 years ago
- Codebase of ACL2024 paper "Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Ques…☆16Jun 4, 2024Updated last year
- Source code for ACL 2021 paper "Automatic ICD Coding via Interactive Shared Representation Networks with Self-distillation Mechanism"☆14Jun 1, 2021Updated 4 years ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆69Feb 27, 2024Updated 2 years ago
- A publishing website of a table collecting meta-learning-related papers in the area of human language processing.☆17Aug 2, 2021Updated 4 years ago
- 测试 https://huggingface.co/OFA-Sys/gsm8k-rft-llama7b-u13b 的 GSM8K 分数☆15Aug 10, 2023Updated 2 years ago
- ☆52Oct 23, 2023Updated 2 years ago
- [DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift☆38Jan 25, 2024Updated 2 years ago
- ☆16Oct 24, 2021Updated 4 years ago
- Official code for the paper: "Metadata Archaeology"☆19May 10, 2023Updated 2 years ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16May 3, 2022Updated 3 years ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆165May 7, 2024Updated last year
- Implementation of the paper: "Making Retrieval-Augmented Language Models Robust to Irrelevant Context"☆76Aug 6, 2024Updated last year
- ☆43Sep 3, 2024Updated last year
- ☆45Sep 21, 2024Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆38Jul 27, 2023Updated 2 years ago
- A flexible and efficient training framework for large-scale alignment tasks☆451Oct 23, 2025Updated 4 months ago