This respository contains the code for extracting the test samples we used in our paper: "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity"
☆80Nov 24, 2023Updated 2 years ago
Alternatives and similar repositories for chatgpt-evaluation
Users that are interested in chatgpt-evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".☆17May 24, 2022Updated 3 years ago
- Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ ACL 2022☆13Apr 13, 2022Updated 4 years ago
- 用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information(ACL2021)☆10Nov 15, 2021Updated 4 years ago
- Towards Few-Shot Fact-Checking via Perplexity☆13Jun 11, 2021Updated 4 years ago
- CAiRE in DialDoc21: Data Augmentation for Information-SeekingDialogue System☆11May 24, 2022Updated 3 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Versatile Generative Language Model☆25Oct 29, 2022Updated 3 years ago
- ☆15Dec 10, 2021Updated 4 years ago
- Unified MultiWOZ evaluation scripts for the context-to-response task.☆59Oct 11, 2023Updated 2 years ago
- ☆14Aug 21, 2025Updated 8 months ago
- [EACL'23] MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages☆23Feb 13, 2023Updated 3 years ago
- Mutual Information Predicts Hallucinations in Abstractive Summarization☆13Nov 14, 2022Updated 3 years ago
- [ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Pr…☆24Jun 1, 2024Updated last year
- A collection of instruction data and scripts for machine translation.☆20Sep 23, 2023Updated 2 years ago
- ☆25Dec 13, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapte…☆17Jan 15, 2024Updated 2 years ago
- Interpretable unified language safety checking with large language models☆32Apr 15, 2023Updated 3 years ago
- ☆10Sep 27, 2021Updated 4 years ago
- ☆15Oct 20, 2023Updated 2 years ago
- Detect hallucinated tokens for conditional sequence generation.☆64Apr 15, 2022Updated 4 years ago
- The code for paper "ProQA: Structural Prompt-based Pre-training for Unified Question Answering"☆11Feb 7, 2023Updated 3 years ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆30Mar 5, 2024Updated 2 years ago
- Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"☆13Feb 14, 2022Updated 4 years ago
- Code for generating Quasimodo, a commonsense knowledge base.☆20Sep 14, 2021Updated 4 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- 🎁[ChatGPT4NLU] A Comparative Study on ChatGPT and Fine-tuned BERT☆191Apr 17, 2023Updated 3 years ago
- [ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"☆53Jan 21, 2024Updated 2 years ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆82Apr 11, 2024Updated 2 years ago
- Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"☆17Mar 2, 2026Updated 2 months ago
- Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits