Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant sentences in problem descriptions. GSM-IC is constructed to evaluate the distractibility of language models.
☆65Feb 13, 2023Updated 3 years ago
Alternatives and similar repositories for GSM-IC
Users that are interested in GSM-IC are comparing it to the libraries listed below
Sorting:
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- Code and Data for our EMNLP 2020 paper titled 'Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multiho…☆28Feb 9, 2022Updated 4 years ago
- [AAAI 2025] Augmenting Math Word Problems via Iterative Question Composing (https://arxiv.org/abs/2401.09003)☆23Oct 2, 2025Updated 5 months ago
- ☆88Jun 1, 2023Updated 2 years ago
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …☆19Dec 16, 2022Updated 3 years ago
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- ☆10Feb 6, 2025Updated last year
- ☆52Jul 4, 2023Updated 2 years ago
- ☆13Mar 1, 2022Updated 4 years ago
- Source Code for the JAIR Paper "Does CLIP Know my Face?" (Demo: https://huggingface.co/spaces/AIML-TUDA/does-clip-know-my-face)☆16Jul 9, 2024Updated last year
- ☆14May 7, 2024Updated last year
- ☆11Oct 3, 2021Updated 4 years ago
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆85Feb 5, 2024Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- The official implementation for ACL 2021 "Challenges in Information Seeking QA: Unanswerable Questions and Paragraph Retrieval".☆28Jun 19, 2021Updated 4 years ago
- ☆15May 15, 2025Updated 9 months ago
- ☆30Dec 27, 2024Updated last year
- Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation☆31May 8, 2023Updated 2 years ago
- The open source implementation of "Connecting Large Language Models with Evolutionary Algorithms Yields Powerful Prompt Optimizers"☆19Mar 11, 2024Updated 2 years ago
- Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".☆15Oct 14, 2022Updated 3 years ago
- Pytorch version of the CVPR2014 paper: "Deep CNN-Based Blind Image Quality Predictor."☆13Aug 18, 2021Updated 4 years ago
- The accompanying code for "Injecting Numerical Reasoning Skills into Language Models" (Mor Geva*, Ankit Gupta* and Jonathan Berant, ACL 2…☆89Aug 20, 2024Updated last year
- Source code for ACL 2021 paper "Automatic ICD Coding via Interactive Shared Representation Networks with Self-distillation Mechanism"☆14Jun 1, 2021Updated 4 years ago
- Codebase of ACL2024 paper "Spiral of Silence: How is Large Language Model Killing Information Retrieval?—A Case Study on Open Domain Ques…☆16Jun 4, 2024Updated last year
- A publishing website of a table collecting meta-learning-related papers in the area of human language processing.☆17Aug 2, 2021Updated 4 years ago
- 测试 https://huggingface.co/OFA-Sys/gsm8k-rft-llama7b-u13b 的 GSM8K 分数☆15Aug 10, 2023Updated 2 years ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆69Feb 27, 2024Updated 2 years ago
- 高质量闲聊数据介绍☆30Dec 12, 2018Updated 7 years ago
- ☆52Oct 23, 2023Updated 2 years ago
- [DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift☆38Jan 25, 2024Updated 2 years ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- ☆16Oct 24, 2021Updated 4 years ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆165May 7, 2024Updated last year
- ☆45Sep 21, 2024Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆38Jul 27, 2023Updated 2 years ago
- Complexity Based Prompting for Multi-Step Reasoning☆17Mar 10, 2023Updated 3 years ago
- [NAACL 2024] Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models☆86Mar 13, 2024Updated last year