EternityYW / TRAM-BenchmarkLinks
TRAM: Benchmarking Temporal Reasoning for Large Language Models (Findings of ACL 2024)
☆26Updated last year
Alternatives and similar repositories for TRAM-Benchmark
Users that are interested in TRAM-Benchmark are comparing it to the libraries listed below
Sorting:
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆109Updated 2 years ago
- ☆27Updated 2 years ago
- Methods and evaluation for aligning language models temporally☆30Updated last year
- The Unreliability of Explanations in Few-shot Prompting for Textual Reasoning (NeurIPS 2022)☆16Updated 2 years ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆119Updated last year
- Semi-Parametric Editing with a Retrieval-Augmented Counterfactual Model☆71Updated 3 years ago
- AbstainQA, ACL 2024☆28Updated this week
- Code for the ACL-2022 paper "Knowledge Neurons in Pretrained Transformers"☆173Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆63Updated 2 years ago
- Resources for Retrieval Augmentation for Commonsense Reasoning: A Unified Approach. EMNLP 2022.☆23Updated 3 years ago
- Code for reproducing the ACL'23 paper: Don't Generate, Discriminate: A Proposal for Grounding Language Models to Real-World Environments☆78Updated 8 months ago
- [ICML 2023] Code for our paper “Compositional Exemplars for In-context Learning”.☆102Updated 2 years ago
- ☆57Updated last year
- [EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.☆27Updated 3 years ago
- ☆48Updated 2 years ago
- Data and code for the ICLR 2023 paper "Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning".☆165Updated 2 years ago
- ☆28Updated last year
- [NAACL'25 Oral] Steering Knowledge Selection Behaviours in LLMs via SAE-Based Representation Engineering☆72Updated 3 weeks ago
- Evaluating the Ripple Effects of Knowledge Editing in Language Models☆56Updated last year
- ☆103Updated 2 years ago
- The official code and dataset for EMNLP 2022 paper "COPEN: Probing Conceptual Knowledge in Pre-trained Language Models".☆21Updated 2 years ago
- ☆41Updated 2 years ago
- Supporting code for ReCEval paper☆31Updated last year
- Code and Data for NeurIPS2021 Paper "A Dataset for Answering Time-Sensitive Questions"☆75Updated 3 years ago
- Official repo for ACL 2023 paper Code4Struct: Code Generation for Few-Shot Structured Prediction from Natural Language.☆43Updated 2 years ago
- ☆177Updated last year
- ☆50Updated 2 years ago
- ☆57Updated 8 months ago
- ☆37Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆29Updated last year