First explanation metric (diagnostic report) for text generation evaluation
☆62Mar 3, 2025Updated last year
Alternatives and similar repositories for InstructScore_SEScore3
Users that are interested in InstructScore_SEScore3 are comparing it to the libraries listed below
Sorting:
- ☆17Mar 3, 2025Updated last year
- This repository contains the joint use of CPO and SimPO method for better reference-free preference learning methods.☆56Aug 13, 2024Updated last year
- Introduction and scripts for ACL-2020 paper "On Exposure Bias, Hallucination and Domain Shift in Neural Machine Translation"☆21Jun 23, 2020Updated 5 years ago
- Tools for evaluating the performance of MT metrics on data from recent WMT metrics shared tasks.☆126Oct 13, 2025Updated 4 months ago
- ☆36May 25, 2023Updated 2 years ago
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generation☆215Feb 10, 2024Updated 2 years ago
- GEMBA — GPT Estimation Metric Based Assessment☆146Dec 15, 2025Updated 2 months ago
- AgentsCourt: Building Judicial Decision-Making Agents with Court Debate Simulation and Legal Knowledge Augmentation (EMNLP 2024 Findings)☆15Dec 30, 2024Updated last year
- EMNLP 2022: Analyzing and Evaluating Faithfulness in Dialogue Summarization☆13Mar 20, 2025Updated 11 months ago
- The Conceptual Coverage Across Languages Benchmark for Text-to-Image Models☆12Oct 28, 2024Updated last year
- Benchmark for evaluating open-ended generation☆50Nov 6, 2024Updated last year
- Crawled Wikipedia Tables with Passages☆13Aug 19, 2021Updated 4 years ago
- MetricEval: A framework that conceptualizes and operationalizes four main components of metric evaluation, in terms of reliability and va…☆12Nov 6, 2023Updated 2 years ago
- ☆14Oct 17, 2023Updated 2 years ago
- ☆32Feb 8, 2024Updated 2 years ago
- "TIGERScore: Towards Building Explainable Metric for All Text Generation Tasks" [TMLR 2024]☆32Dec 21, 2024Updated last year
- Code for paper 'Accelerating Antimicrobial Peptide Discovery with Latent Sequence-Structure Model'☆12Mar 21, 2024Updated last year
- Code for ProTrix: Building Models for Planning and Reasoning over Tables with Sentence Context☆18Nov 15, 2024Updated last year
- Code for SIGdial 2020 paper: Unsupervised Evaluation of Interactive Dialog with DialoGPT (https://arxiv.org/abs/2006.12719)☆29Jun 8, 2020Updated 5 years ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆37Oct 11, 2023Updated 2 years ago
- This repo contains code for the paper "Psychologically-informed chain-of-thought prompts for metaphor understanding in large language mod…☆14Apr 28, 2023Updated 2 years ago
- Model in the loop approach for fig lang generation and explainibilty Code and Data for EMNLP 2022 paper FLUTE: Figurative Language Unders…☆13Apr 22, 2023Updated 2 years ago
- Data-driven Summarization of Scientific Articles☆10Jun 18, 2018Updated 7 years ago
- [ChatGPT4MTevaluation] ErrorAnalysis Prompt for MT Evaluation in ChatGPT☆92Oct 14, 2025Updated 4 months ago
- ☆17Jul 18, 2022Updated 3 years ago
- Improving Word Translation via Two-Stage Contrastive Learning (ACL 2022). Keywords: Bilingual Lexicon Induction, Word Translation, Cross-…☆36Jan 23, 2025Updated last year
- ☆62Oct 30, 2022Updated 3 years ago
- Resource, Evaluation and Detection Papers for ChatGPT☆456Mar 21, 2024Updated last year
- [BMVC 2024 Oral ✨] Revisiting Image Captioning Training Paradigm via Direct CLIP-based Optimization☆20Sep 11, 2024Updated last year
- Code release for "TempLM: Distilling Language Models into Template-Based Generators"☆14Jul 21, 2022Updated 3 years ago
- Official implementation repository for the paper Towards General Conceptual Model Editing via Adversarial Representation Engineering.☆19Dec 6, 2024Updated last year
- ☆17Oct 31, 2023Updated 2 years ago
- ☆144Sep 10, 2023Updated 2 years ago
- TISE: Bag of Metrics for Text-to-Image Synthesis Evaluation (ECCV 2022)☆35Nov 12, 2024Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Nov 29, 2023Updated 2 years ago
- [ICLR 2025] Permute-and-Flip: An optimally robust and watermarkable decoder for LLMs☆19Mar 20, 2025Updated 11 months ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000…☆48Dec 7, 2023Updated 2 years ago
- About Data and Codes for EMNLP 2023 System Demo Paper "QACHECK: A Demonstration System for Question-Guided Multi-Hop Fact-Checking"☆19Dec 19, 2023Updated 2 years ago
- ☆18Oct 19, 2020Updated 5 years ago