minnesotanlp / cobbler
Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"
☆18Updated 11 months ago
Alternatives and similar repositories for cobbler:
Users that are interested in cobbler are comparing it to the libraries listed below
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆69Updated 8 months ago
- [ACL 2023] Knowledge Unlearning for Mitigating Privacy Risks in Language Models☆79Updated 4 months ago
- AVocaDo : Strategy for Adapting Vocabulary to Downstream Domain☆23Updated 2 years ago
- ☆62Updated 2 years ago
- Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (h…☆81Updated 4 years ago
- [ICLR 2022] Towards Continual Knowledge Learning of Language Models☆92Updated 2 years ago
- Codebase, data and models for the SummaC paper in TACL☆87Updated 2 weeks ago
- Code for ACL 2022 paper "Semi-Supervised Formality Style Transfer with Consistency Training".☆17Updated 2 years ago
- ☆25Updated 2 years ago
- 🐥 Code and Dataset for our EMNLP 2022 paper - "ProsocialDialog: A Prosocial Backbone for Conversational Agents"☆61Updated last year
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆70Updated 3 years ago
- ☆43Updated last year
- ☆10Updated last year
- ☆17Updated last month
- code associated with ACL 2021 DExperts paper☆114Updated last year
- Test code of Inverse cloze task for information retrieval☆33Updated 4 years ago
- ☆15Updated 2 years ago
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"☆62Updated 2 years ago
- The official implemetation of "Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks" (NAACL 2022).☆43Updated 2 years ago
- Pytorch Implementation of EncT5: Fine-tuning T5 Encoder for Non-autoregressive Tasks☆63Updated 2 years ago
- Data and code for "A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization" (ACL 2020)☆47Updated last year
- We construct and introduce DIALFACT, a testing benchmark dataset crowd-annotated conversational claims, paired with pieces of evidence fr…☆41Updated 2 years ago
- ☆44Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆17Updated last year
- FRANK: Factuality Evaluation Benchmark☆52Updated 2 years ago
- This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”☆85Updated 2 years ago
- Prompt-and-Rerank: A Method for Zero-Shot and Few-Shot Textual Style Transfer☆34Updated 2 years ago
- ☆75Updated last year
- ☆10Updated 4 months ago
- ☆16Updated last year