Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
☆87Aug 12, 2024Updated last year
Alternatives and similar repositories for instruct-qa
Users that are interested in instruct-qa are comparing it to the libraries listed below
Sorting:
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Nov 2, 2023Updated 2 years ago
- Code and data for reproducing baselines for TopiOCQA, an open-domain conversational question-answering dataset☆56Nov 15, 2023Updated 2 years ago
- ☆18Nov 5, 2025Updated 4 months ago
- Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING…☆17Apr 15, 2025Updated 10 months ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆22May 24, 2023Updated 2 years ago
- Retrieval Augmented Generation Generalized Evaluation Dataset☆61Jul 16, 2025Updated 7 months ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆11Jan 1, 2023Updated 3 years ago
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Models☆47Jan 12, 2024Updated 2 years ago
- 한국어 LLM 리더보드 및 모델 성능/안전성 관리☆22Sep 26, 2023Updated 2 years ago
- Ranger helps you see the forest among the trees - Ranger is an effect-size meta analysis library creating beautiful forest plots!☆11Jun 12, 2023Updated 2 years ago
- FeedbackQA: Improving Question Answering Post-Deployment with Interactive Feedback☆12Jul 13, 2022Updated 3 years ago
- Implementation of AdaCQR(COLING 2025)☆13Dec 30, 2024Updated last year
- https://arxiv.org/abs/2404.10917☆14Mar 18, 2025Updated 11 months ago
- Data and code for paper "ODSum: New Benchmarks for Open Domain Multi-Document Summarization"☆11Sep 20, 2024Updated last year
- ☆11Sep 24, 2024Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆154Aug 18, 2025Updated 6 months ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated last year
- Repo for Llatrieval☆31Aug 21, 2024Updated last year
- The LAMP Platform (issues and documentation).☆14Feb 18, 2026Updated 2 weeks ago
- ☆10Dec 2, 2024Updated last year
- ☆13Sep 6, 2022Updated 3 years ago
- Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'☆13Aug 2, 2024Updated last year
- ☆50Feb 5, 2023Updated 3 years ago
- ConvGQR: Generative Query Reformulation for Conversational Search. A codebase for ACL 2023 accepted paper.☆33Mar 5, 2024Updated 2 years ago
- ☆15Feb 28, 2024Updated 2 years ago
- Post-editing Datasets by Rakuten (PEDRa)☆14Jun 23, 2021Updated 4 years ago
- In-Context Alignment: Chat with Vanilla Language Models Before Fine-Tuning☆35Aug 9, 2023Updated 2 years ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆102Dec 2, 2024Updated last year
- codebase release for EMNLP2023 paper publication☆19Sep 18, 2025Updated 5 months ago
- Framework for Cost-Effective Language Model Choice☆16Dec 12, 2023Updated 2 years ago
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆18Apr 25, 2021Updated 4 years ago
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 6 months ago
- [COLM 2025] "C3PO: Critical-Layer, Core-Expert, Collaborative Pathway Optimization for Test-Time Expert Re-Mixing"☆20Apr 9, 2025Updated 10 months ago
- A Framework aims to wisely initialize unseen subword embeddings in PLMs for efficient large-scale continued pretraining☆18Nov 26, 2023Updated 2 years ago
- Code and data from the paper 'Human Feedback is not Gold Standard'☆20Feb 24, 2026Updated last week
- SafeArena is a benchmark for assessing the harmful capabilities of web agents☆21Apr 23, 2025Updated 10 months ago
- ☆28May 27, 2024Updated last year
- ☆23Jan 27, 2025Updated last year
- Hercules: Attributable and Scalable Opinion Summarization (ACL 2023)☆20Nov 8, 2023Updated 2 years ago