yale-nlp / DocMath-EvalLinks
Data and Code for ACL 2024 paper "DocMath-Eval: Evaluating Math Reasoning Capabilities of LLMs in Understanding Long and Specialized Documents"
☆23Updated last year
Alternatives and similar repositories for DocMath-Eval
Users that are interested in DocMath-Eval are comparing it to the libraries listed below
Sorting:
- Data and Code for EMNLP 2023 paper "QTSumm: Query-Focused Summarization over Tabular Data"☆22Updated last year
- ☆33Updated last year
- Merging Generated and Retrieved Knowledge for Open-Domain QA (EMNLP 2023)☆22Updated 2 years ago
- A comprehensive paper list of Reasoning over Tables.☆30Updated 3 years ago
- Dataset for TACL 2022 paper: "FeTaQA: Free-form Table Question Answering"☆85Updated 2 years ago
- ☆26Updated 2 years ago
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆54Updated last year
- Repo for outstanding paper@ACL 2023 "Do PLMs Know and Understand Ontological Knowledge?"☆33Updated 2 years ago
- Data and code for ACL 2022 paper "MultiHiertt: Numerical Reasoning over Multi Hierarchical Tabular and Textual Data"☆51Updated last year
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 9 months ago
- ☆31Updated 10 months ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆59Updated 2 years ago
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"☆82Updated 2 years ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000…☆48Updated 2 years ago
- [EMNLP 2023] C-STS: Conditional Semantic Textual Similarity☆73Updated last year
- ☆88Updated 2 years ago
- [ACL 23] CodeIE: Large Code Generation Models are Better Few-Shot Information Extractors☆40Updated 2 weeks ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)☆50Updated last year
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Updated last year
- 🩺 A collection of ChatGPT evaluation reports on various bechmarks.☆50Updated 2 years ago
- ☆57Updated last year
- ☆32Updated 2 years ago
- [ACL 2022] A hierarchical table dataset for question answering and data-to-text generation.☆103Updated 2 weeks ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated 2 years ago
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"☆11Updated 2 years ago
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planning☆36Updated 2 years ago
- [ACL 2023] Plug-and-Play Knowledge Injection for Pre-trained Language Models☆62Updated last year
- [Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedback☆40Updated 2 years ago
- Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"☆75Updated 3 years ago
- Repository for the paper "Cognitive Mirage: A Review of Hallucinations in Large Language Models"☆47Updated 2 years ago