First explanation metric (diagnostic report) for text generation evaluation
☆62Mar 3, 2025Updated last year
Alternatives and similar repositories for InstructScore_SEScore3
Users that are interested in InstructScore_SEScore3 are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆17Mar 3, 2025Updated last year
- Crawled Wikipedia Tables with Passages☆14Aug 19, 2021Updated 4 years ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆217Feb 10, 2024Updated 2 years ago
- ☆36May 25, 2023Updated 2 years ago
- The project page for "SCITAB: A Challenging Benchmark for Compositional Reasoning and Claim Verification on Scientific Tables"☆23Dec 21, 2023Updated 2 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆129Apr 23, 2026Updated last week
- Benchmark for evaluating open-ended generation☆51Nov 6, 2024Updated last year
- "TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]☆33Dec 21, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated last year
- About Data and Codes for EMNLP 2023 System Demo Paper "QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking"☆19Dec 19, 2023Updated 2 years ago
- Official implementations for (1) BlonDe: An Automatic Evaluation Metric for Document-level Machine Translation and (2) Discourse Centric …☆84Sep 21, 2023Updated 2 years ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Apr 28, 2023Updated 3 years ago
- Code for SIGdial 2020 paper: Unsupervised Evaluation of Interactive Dialog with DialoGPT (https://arxiv.org/abs/2006.12719)☆28Jun 8, 2020Updated 5 years ago
- Microsoft question-answering dataset☆10Jun 16, 2023Updated 2 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Codes for ACL 2023 Paper "Fact-Checking Complex Claims with Program-Guided Reasoning"☆31Jun 2, 2023Updated 2 years ago
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.☆57Aug 13, 2024Updated last year
- An unnecessarily tiny and minimal implementation of GPT-2 in NumPy.☆11Feb 12, 2023Updated 3 years ago
- ☆144Sep 10, 2023Updated 2 years ago
- ☆18Oct 19, 2020Updated 5 years ago
- AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation (EMNLP 2024 Findings)☆16Dec 30, 2024Updated last year
- ☆14Oct 17, 2023Updated 2 years ago
- Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-…☆36Jan 23, 2025Updated last year
- ☆32Feb 8, 2024Updated 2 years ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Oct 11, 2023Updated 2 years ago
- [EMNLP 2021] Dataset and PyTorch Code for ExplaGraphs: An Explanation Graph Generation Task for Structured Commonsense Reasoning☆14Nov 5, 2022Updated 3 years ago
- The Conceptual Coverage Across Languages Benchmark for Text-to-Image Models☆12Oct 28, 2024Updated last year
- Resource, Evaluation and Detection Papers for ChatGPT☆455Mar 21, 2024Updated 2 years ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated last year
- [ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT☆91Oct 14, 2025Updated 6 months ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000…☆49Dec 7, 2023Updated 2 years ago
- The Official Repository for the Automatic Dialogue Evaluation Sub-task of DSTC10 Track 5 (Automatic Evaluation and Moderation of Open-dom…☆19Nov 1, 2021Updated 4 years ago
- A Neural Framework for MT Evaluation☆745Apr 21, 2026Updated 2 weeks ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Model in the loop approach for fig lang generation and explainibilty Code and Data for EMNLP 2022 paper FLUTE: Figurative Language Unders…☆13Apr 22, 2023Updated 3 years ago
- Metaskill: A Meta-Skill for Autonomous AI Agent Team Generation☆35Feb 23, 2026Updated 2 months ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- [COLM '24] Source-Aware Training Enables Knowledge Attribution in Language Models☆19Apr 1, 2025Updated last year
- Code repo for SIGIR 2021 paper "Few-Shot Conversational Dense Retrieval"☆43Dec 9, 2021Updated 4 years ago
- Evaluate the Quality of Critique☆37Jun 1, 2024Updated last year
- ☆12Jan 5, 2023Updated 3 years ago