HLTCHKUST / chatgpt-evaluation
This respository contains the code for extracting the test samples we used in our paper: "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity"
☆77Updated last year
Alternatives and similar repositories for chatgpt-evaluation:
Users that are interested in chatgpt-evaluation are comparing it to the libraries listed below
- This project maintains a reading list for general text generation tasks☆65Updated 3 years ago
- An Interpretable Neuro-Symbolic Framework for Task-Oriented Dialogue Generation☆23Updated 3 years ago
- reStructured Pre-training☆98Updated 2 years ago
- ☆26Updated 2 years ago
- DSTC10 Track1 - MOD: Internet Meme Incorporated Open-domain Dialog☆50Updated 2 years ago
- Official Code for "PPT: Pre-trained Prompt Tuning for Few-shot Learning". ACL 2022☆108Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated 2 years ago
- ☆117Updated 2 years ago
- [NeurIPS 2022] Generating Training Data with Language Models: Towards Zero-Shot Language Understanding☆64Updated 2 years ago
- Lexically constrained text generation with CBART.☆48Updated 2 years ago
- EMNLP'2022: BERTScore is Unfair: On Social Bias in Language Model-Based Metrics for Text Generation☆41Updated 2 years ago
- Code for the paper Code for the paper InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning☆99Updated last year
- A toolkit for evaluation of natural language generation (NLG), including BLEU, ROUGE, METEOR, and CIDEr.☆31Updated 4 years ago
- ☆78Updated 2 years ago
- EMNLP 2022: ClidSum: A Benchmark Dataset for Cross-Lingual Dialogue Summarization☆35Updated last year
- ☆61Updated 2 years ago
- Paper list of "The Life Cycle of Knowledge in Big Language Models: A Survey"☆59Updated last year
- ⚡Research papers about leveraging the capabilities of language models⚡☆52Updated last year
- [ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering☆45Updated 2 years ago
- ☆36Updated last year
- [NAACL'22] TaCL: Improving BERT Pre-training with Token-aware Contrastive Learning☆93Updated 2 years ago
- We construct and introduce DIALFACT, a testing benchmark dataset crowd-annotated conversational claims, paired with pieces of evidence fr…☆41Updated 2 years ago
- This repository is the official implementation of our paper MVP: Multi-task Supervised Pre-training for Natural Language Generation.☆72Updated 2 years ago
- ☆88Updated 2 years ago
- Paradigm shift in natural language processing☆42Updated 2 years ago
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models" (NeurIPS 2023)☆58Updated last year
- Official code for "Continual Prompt Tuning for Dialog State Tracking" (ACL 2022).☆27Updated 2 years ago
- First explanation metric (diagnostic report) for text generation evaluation☆61Updated last month
- [NeurIPS'22 Spotlight] Data and code for our paper CoNT: Contrastive Neural Text Generation☆152Updated last year
- Source code for the ACL-IJCNLP 2021 paper entitled "T-DNA: Taming Pre-trained Language Models with N-gram Representations for Low-Resourc…☆19Updated 2 years ago