google-research-datasets / GSM-ICView external linksLinks
Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant sentences in problem descriptions. GSM-IC is constructed to evaluate the distractibility of language models.
☆64Feb 13, 2023Updated 3 years ago
Alternatives and similar repositories for GSM-IC
Users that are interested in GSM-IC are comparing it to the libraries listed below
Sorting:
- GSM-Plus: Data, Code, and Evaluation for Enhancing Robust Mathematical Reasoning in Math Word Problems.☆64Jul 8, 2024Updated last year
- Code and Data for our EMNLP 2020 paper titled 'Learning to Explain: Datasets and Models for Identifying Valid Reasoning Chains in Multiho…☆28Feb 9, 2022Updated 4 years ago
- ☆88Jun 1, 2023Updated 2 years ago
- Code for Findings of ACL 2021 paper "Addressing Inquiries about History: An Efficient and Practical Framework for Evaluating Open-domain …☆19Dec 16, 2022Updated 3 years ago
- Prompt-Guided Retrieval For Non-Knowledge-Intensive Tasks☆12Sep 1, 2023Updated 2 years ago
- ☆10Jan 28, 2024Updated 2 years ago
- ☆52Jul 4, 2023Updated 2 years ago
- ☆10Feb 6, 2025Updated last year
- Official repository for ALT (ALignment with Textual feedback).☆10Jul 25, 2024Updated last year
- ☆11Oct 3, 2021Updated 4 years ago
- ☆13Mar 1, 2022Updated 3 years ago
- ☆14May 7, 2024Updated last year
- [ICML'2024] Can AI Assistants Know What They Don't Know?☆85Feb 5, 2024Updated 2 years ago
- Repo for ICML23 "Why do Nearest Neighbor Language Models Work?"☆59Jan 12, 2023Updated 3 years ago
- AIS is an evaluation framework for assessing whether the output of natural language models only contains information about the external w…☆31Jan 14, 2023Updated 3 years ago
- ☆36Oct 29, 2024Updated last year
- ☆15May 15, 2025Updated 9 months ago
- ☆30Dec 27, 2024Updated last year
- Difference-aware Knowledge Selection for Knowledge-grounded Conversation Generation☆31May 8, 2023Updated 2 years ago
- Code Repository for "A Causal Framework to Quantify the Robustness of Mathematical Reasoning with Language Models".☆15Oct 14, 2022Updated 3 years ago
- SOTA work about out-of-distribution detection☆14Mar 5, 2021Updated 4 years ago
- Pytorch version of the CVPR2014 paper: "Deep CNN-Based Blind Image Quality Predictor."☆13Aug 18, 2021Updated 4 years ago
- Code & Data for our Paper "Alleviating Hallucinations of Large Language Models through Induced Hallucinations"☆69Feb 27, 2024Updated last year
- 测试 https://huggingface.co/OFA-Sys/gsm8k-rft-llama7b-u13b 的 GSM8K 分数☆15Aug 10, 2023Updated 2 years ago
- Source code for ACL 2021 paper "Automatic ICD Coding via Interactive Shared Representation Networks with Self-distillation Mechanism"☆14Jun 1, 2021Updated 4 years ago
- 高质量闲聊数据介绍☆30Dec 12, 2018Updated 7 years ago
- ☆51Oct 23, 2023Updated 2 years ago
- [DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift☆38Jan 25, 2024Updated 2 years ago
- Minimum Description Length probing for neural network representations☆20Jan 28, 2025Updated last year
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16May 3, 2022Updated 3 years ago
- Official code for the paper: "Metadata Archaeology"☆19May 10, 2023Updated 2 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- Code and data accompanying our paper on arXiv "Faithful Chain-of-Thought Reasoning".☆165May 7, 2024Updated last year
- ☆43Sep 3, 2024Updated last year
- Generating diverse counterfactual data for Natural Language Understanding tasks using Large Language Models (LLMs). The generator support…☆38Jul 27, 2023Updated 2 years ago
- Complexity Based Prompting for Multi-Step Reasoning☆17Mar 10, 2023Updated 2 years ago
- ☆48Jan 21, 2024Updated 2 years ago
- A flexible and efficient training framework for large-scale alignment tasks☆449Oct 23, 2025Updated 3 months ago
- Code associated with Tuning Language Models by Proxy (Liu et al., 2024)☆127Mar 30, 2024Updated last year