dxlong2000 / FormatBiasEvalLinks
[Preprint' 24] LLMs Are Biased Towards Output Formats! Systematically Evaluating and Mitigating Output Format Bias of LLMs
☆11Updated last year
Alternatives and similar repositories for FormatBiasEval
Users that are interested in FormatBiasEval are comparing it to the libraries listed below
Sorting:
- Awesome LLM for NLG Evaluation Papers☆25Updated last year
- ☆15Updated 3 years ago
- HANNA, a large annotated dataset of Human-ANnotated NArratives for ASG evaluation.☆35Updated last year
- A comprehensive paper list of Reasoning over Tables.☆29Updated 3 years ago
- ☆62Updated 3 years ago
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Models☆47Updated last year
- Codebase, data and models for the SummaC paper in TACL☆103Updated 9 months ago
- ☆30Updated 11 months ago
- Code and data for Koo et al's ACL 2024 paper "Benchmarking Cognitive Biases in Large Language Models as Evaluators"☆21Updated last year
- This code accompanies the paper DisentQA: Disentangling Parametric and Contextual Knowledge with Counterfactual Question Answering.☆16Updated 2 years ago
- Codes for ACL 2023 Paper "Fact-Checking Complex Claims with Program-Guided Reasoning"☆31Updated 2 years ago
- FRANK: Factuality Evaluation Benchmark☆59Updated 2 years ago
- ☆20Updated last year
- Code for the paper "Open Domain Question Answering with A Unified Knowledge Interface" (ACL 2022)☆56Updated 2 years ago
- [EMNLP 2022] TemporalWiki: A Lifelong Benchmark for Training and Evaluating Ever-Evolving Language Models☆74Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆150Updated 3 months ago
- Code for ACL 2022 paper "Semi-Supervised Formality Style Transfer with Consistency Training".☆17Updated 3 years ago
- ☆68Updated 11 months ago
- Code and models for the paper "Questions Are All You Need to Train a Dense Passage Retriever (TACL 2023)"☆62Updated 2 years ago
- ☆10Updated 2 years ago
- ☆27Updated 3 years ago
- Official implementation of the ACL 2023 paper: "Zero-shot Faithful Factual Error Correction"☆17Updated 2 years ago
- ☆15Updated 3 years ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"☆32Updated 7 months ago
- Codes for our paper "CTRLEval: An Unsupervised Reference-Free Metric for Evaluating Controlled Text Generation" (ACL 2022)☆33Updated 3 years ago
- The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapte…☆17Updated last year
- ☆43Updated 2 years ago
- The Dataset and Official Implementation for <Discursive Socratic Questioning: Evaluating the Faithfulness of Language Models’ Understandi…☆17Updated last year
- [ACL 2023] S3HQA: A Three-Stage Approach for Multi-hop Text-Table Hybrid Question Answering☆20Updated 5 months ago
- ☆20Updated 2 years ago