Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"
☆87Aug 12, 2024Updated last year
Alternatives and similar repositories for instruct-qa
Users that are interested in instruct-qa are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code and data for reproducing baselines for TopiOCQA, an open-domain conversational question-answering dataset☆57Nov 15, 2023Updated 2 years ago
- Code for "Can Retriever-Augmented Language Models Reason? The Blame Game Between the Retriever and the Language Model", EMNLP Findings 20…☆28Nov 2, 2023Updated 2 years ago
- Code and data for "KoDialogBench: Evaluating Conversational Understanding of Language Models with Korean Dialogue Benchmark" (LREC-COLING…☆18Apr 15, 2025Updated last year
- TACL 2025: Investigating Adversarial Trigger Transfer in Large Language Models☆19Aug 17, 2025Updated 10 months ago
- Code and data setup for the paper "Are Diffusion Models Vision-and-language Reasoners?"☆33Mar 15, 2024Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Retrieval Augmented Generation Generalized Evaluation Dataset☆61Jul 16, 2025Updated 11 months ago
- Code and data for the IWSLT 2022 shared task on Formality Control for SLT☆22May 24, 2023Updated 3 years ago
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Models☆47Jan 12, 2024Updated 2 years ago
- 한국어 LLM 리더보드 및 모델 성능/안전성 관리☆22Sep 26, 2023Updated 2 years ago
- ☆18Nov 5, 2025Updated 7 months ago
- ConvGQR: Generative Query Reformulation for Conversational Search. A codebase for ACL 2023 accepted paper.☆35Mar 5, 2024Updated 2 years ago
- Source code for SIGIR 2022 paper.☆16Apr 25, 2022Updated 4 years ago
- SafeArena is a benchmark for assessing the harmful capabilities of web agents☆24Apr 23, 2025Updated last year
- codebase release for EMNLP2023 paper publication☆19Sep 18, 2025Updated 9 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- ☆51Feb 5, 2023Updated 3 years ago
- ☆13Sep 6, 2022Updated 3 years ago
- Implementation of AdaCQR(COLING 2025)☆15Dec 30, 2024Updated last year
- Code for paper 'Are We Falling in a Middle-Intelligence Trap? An Analysis and Mitigation of the Reversal Curse'☆14Aug 2, 2024Updated last year
- Ranger helps you see the forest among the trees - Ranger is an effect-size meta analysis library creating beautiful forest plots!☆12Jun 12, 2023Updated 3 years ago
- ☆25Oct 22, 2022Updated 3 years ago
- ☆11Sep 24, 2024Updated last year
- Code, datasets, and checkpoints for the paper "Improving Passage Retrieval with Zero-Shot Question Generation (EMNLP 2022)"☆101Nov 27, 2022Updated 3 years ago
- Repo for Llatrieval☆32Aug 21, 2024Updated last year
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆10Dec 2, 2024Updated last year
- Awesome LLM for NLG Evaluation Papers☆26Jan 23, 2024Updated 2 years ago
- Performs benchmarking on two Korean datasets with minimal time and effort.☆45Jan 22, 2026Updated 5 months ago
- Cross-lingual Fact-to-Text Alignment and Generation for Low-Resource Languages☆11Jan 1, 2023Updated 3 years ago
- [EMNLP 2024] A Retrieval Benchmark for Scientific Literature Search☆109Dec 2, 2024Updated last year
- Source code of the paper: RetrievalQA: Assessing Adaptive Retrieval-Augmented Generation for Short-form Open-Domain Question Answering [F…☆68May 28, 2024Updated 2 years ago
- Open-WikiTable :Dataset for Open Domain Question Answering with Complex Reasoning over Table☆28Jun 2, 2023Updated 3 years ago
- https://arxiv.org/abs/2404.10917☆14Mar 18, 2025Updated last year
- Code & data for EMNLP 2020 paper "MOCHA: A Dataset for Training and Evaluating Reading Comprehension Metrics".☆16May 3, 2022Updated 4 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Code for Evaluating Explanations for Reading Comprehension with Realistic Counterfactuals.☆17Apr 25, 2021Updated 5 years ago
- A Survey of Attributions for Large Language Models☆229Jan 14, 2026Updated 5 months ago
- The Benefits of a Concise Chain of Thought on Problem Solving in Large Language Models☆25Nov 25, 2024Updated last year
- Transparent Reporting of Ethics for Generative AI (TREGAI) Checklist☆15Oct 16, 2024Updated last year
- Reference implementation for Reward-Augmented Decoding: Efficient Controlled Text Generation With a Unidirectional Reward Model☆45Oct 1, 2025Updated 9 months ago
- Multilingual Pre-training with Language and Task Adaptation for Multilingual Text Style Transfer (ACL 2022)☆10Sep 22, 2022Updated 3 years ago
- Repository for "Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators"☆12Mar 25, 2025Updated last year