Synthetic question-answering dataset to formally analyze the chain-of-thought output of large language models on a reasoning task.
☆155Sep 9, 2025Updated 5 months ago
Alternatives and similar repositories for prontoqa
Users that are interested in prontoqa are comparing it to the libraries listed below
Sorting:
- Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…☆11Oct 16, 2022Updated 3 years ago
- Code Repo for "Differentiable Open-Ended Commonsense Reasoning" (NAACL 2021)☆32Jun 30, 2023Updated 2 years ago
- This repository contains a collection of papers and resources on Reasoning in Large Language Models.☆567Nov 13, 2023Updated 2 years ago
- Dataset & Code for Com2Sense Benchmark☆13Sep 8, 2021Updated 4 years ago
- This is the code repo for Findings of EMNLP2022 paper: MICO: a multi-alternative contrastive learning framework for commonsense knowledg…☆10Nov 29, 2022Updated 3 years ago
- Official code repository for Findings of EMNLP 2022 paper: PseudoReasoner: Leveraging Pseudo Labels for Commonsense Knowledge Base Popula…☆11Oct 18, 2022Updated 3 years ago
- Neuro-Symbolic Integration Brings Causal and Reliable Reasoning Proofs☆41Feb 15, 2024Updated 2 years ago
- ☆139Dec 22, 2023Updated 2 years ago
- Source code for the paper 'Complex Hyperbolic Knowledge Graph Embeddings with Fast Fourier Transform'.☆12Nov 9, 2022Updated 3 years ago
- CopyBench: Measuring Literal and Non-Literal Reproduction of Copyright-Protected Text in Language Model Generation☆14Aug 19, 2025Updated 6 months ago
- ☆12Apr 25, 2022Updated 3 years ago
- The project page for "LOGIC-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning"☆381Jun 13, 2024Updated last year
- Codes for the EMNLP2021 paper: Benchmarking Commonsense Knowledge Base Population (https://aclanthology.org/2021.emnlp-main.705.pdf). An …☆26Feb 14, 2024Updated 2 years ago
- Train large COMET (T5-3B/GPT2-XL) with small memory (on 11GB memory GPUs like 1080/2080) using DeepSpeed.☆14Jan 23, 2022Updated 4 years ago
- Code and data for TACL paper It’s not Rocket Science: Interpreting Figurative Language in Narratives☆15Sep 4, 2023Updated 2 years ago
- ☆17Apr 7, 2025Updated 10 months ago
- Official Code for EMNLP2023 Main Conference paper: "KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detec…☆30Nov 14, 2023Updated 2 years ago
- Bridging the Generalization Gap in Text-to-SQL Parsing with Schema Expansion☆14Jul 26, 2023Updated 2 years ago
- Data for the MTEB leaderboard☆46Feb 23, 2026Updated last week
- [EMNLP 2023] Once Upon a *Time* in *Graph*: Relative-Time Pretraining for Complex Temporal Reasoning☆17Oct 31, 2023Updated 2 years ago
- The official code and dataset for EMNLP 2022 paper "COPEN: Probing Conceptual Knowledge in Pre-trained Language Models".☆21Mar 9, 2023Updated 2 years ago
- [Findings of EMNLP22] From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models☆19Mar 16, 2023Updated 2 years ago
- Official code repository for the main conference paper in ACL2023: COLA: Contextualized Commonsense Causality Reasoning from the Causal I…☆33May 12, 2023Updated 2 years ago
- We are creating a challenging new benchmark MultiReQA: A Cross-Domain Evaluation for Retrieval Question Answering Models. Retrieval quest…☆31Jul 9, 2020Updated 5 years ago
- Code repo for MathAgent☆19Dec 15, 2023Updated 2 years ago
- ☆18Feb 25, 2022Updated 4 years ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆548Jun 25, 2024Updated last year
- EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443☆86Sep 15, 2024Updated last year
- A fast and neat API for Conceptualization of Probase☆17Oct 28, 2019Updated 6 years ago
- ☆187Jul 2, 2025Updated 8 months ago
- An extensible benchmark for evaluating large language models on planning☆451Sep 17, 2025Updated 5 months ago
- Scripts for downloading and pre-processing the `proof-pile`, a high quality dataset of mathematical text and code.☆22Nov 26, 2022Updated 3 years ago
- WinoWhy provides human-annotated reasons for answering WSC questions.☆18May 13, 2020Updated 5 years ago
- ☆36Dec 20, 2024Updated last year
- Grade-School Math with Irrelevant Context (GSM-IC) benchmark is an arithmetic reasoning dataset built upon GSM8K, by adding irrelevant se…☆65Feb 13, 2023Updated 3 years ago
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".☆85Oct 31, 2022Updated 3 years ago
- [ICML 2023] Tuning Language Models as Training Data Generators for Augmentation-Enhanced Few-Shot Learning☆44May 10, 2023Updated 2 years ago
- ☆25Aug 23, 2024Updated last year
- Temporal Commonsense Reasoning in Dialog☆72Jun 9, 2021Updated 4 years ago