HLTCHKUST/chatgpt-evaluation

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/HLTCHKUST/chatgpt-evaluation)

HLTCHKUST / chatgpt-evaluation

This respository contains the code for extracting the test samples we used in our paper: "A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity"

☆80

Alternatives and similar repositories for chatgpt-evaluation

Users that are interested in chatgpt-evaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

HLTCHKUST / KnowExpert
View on GitHub
The implementation of the paper "Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters".
☆17May 24, 2022Updated 4 years ago
pkunlp-icler / MLS
View on GitHub
Source code of our paper "Focus on the Target’s Vocabulary: Masked Label Smoothing for Machine Translation" @ ACL 2022
☆13Apr 13, 2022Updated 4 years ago
27182812 / ChineseBERT_paddle
View on GitHub
用Paddle复现论文ChineseBERT: Chinese Pretraining Enhanced by Glyph and Pinyin Information（ACL2021）
☆10Nov 15, 2021Updated 4 years ago
HLTCHKUST / Perplexity-FactChecking
View on GitHub
Towards Few-Shot Fact-Checking via Perplexity
☆13Jun 11, 2021Updated 5 years ago
HLTCHKUST / CAiRE_in_DialDoc21
View on GitHub
CAiRE in DialDoc21: Data Augmentation for Information-SeekingDialogue System
☆11May 24, 2022Updated 4 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
zlinao / VGLM
View on GitHub
Versatile Generative Language Model
☆25Oct 29, 2022Updated 3 years ago
nuaa-nlp / Multimodality
View on GitHub
☆15Dec 10, 2021Updated 4 years ago
vojtsek / to-llm-bot
View on GitHub
☆38Aug 20, 2024Updated last year
s-nlp / PsiloQA
View on GitHub
The PsiloQA pipeline automates the construction of a multilingual, span-level hallucination detection dataset with contexts.
☆16Apr 24, 2026Updated 3 months ago
zhangxy-2019 / sgp-tod
View on GitHub
☆14Aug 21, 2025Updated 11 months ago
VanderpoelLiam / CPMI
View on GitHub
Mutual Information Predicts Hallucinations in Abstractive Summarization
☆13Nov 14, 2022Updated 3 years ago
Tomiinek / MultiWOZ_Evaluation
View on GitHub
Unified MultiWOZ evaluation scripts for the context-to-response task.
☆59Oct 11, 2023Updated 2 years ago
yueyu1030 / Patron
View on GitHub
[ACL 2023] The code for our ACL'23 paper Cold-Start Data Selection for Few-shot Language Model Fine-tuning: A Prompt-Based Uncertainty Pr…
☆24Jun 1, 2024Updated 2 years ago
wxjiao / InstructMT
View on GitHub
A collection of instruction data and scripts for machine translation.
☆20Sep 23, 2023Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
ujiuji1259 / biocom
View on GitHub
☆10Jun 16, 2021Updated 5 years ago
UKPLab / emnlp2021-prompt-ft-heuristics
View on GitHub
☆10Sep 27, 2021Updated 4 years ago
violet-zct / fairseq-detect-hallucination
View on GitHub
Detect hallucinated tokens for conditional sequence generation.
☆64Apr 15, 2022Updated 4 years ago
HKUST-KnowComp / SubeventWriter
View on GitHub
Official code repository for the main conference paper in EMNLP 2022: SubeventWriter: Iterative Sub-event Sequence Generation with Cohere…
☆11Oct 16, 2022Updated 3 years ago
sairin1202 / SciXGen
View on GitHub
Dataset and model in the paper "SciXGen: A Scientific Paper Dataset for Context-Aware Text Generation"
☆13Feb 14, 2022Updated 4 years ago
ygan / Spider-Syn
View on GitHub
☆21Oct 22, 2021Updated 4 years ago
yuexihang / DeltaPhi
View on GitHub
Implementation for "DeltaPhi: Learning Physical Trajectory Residual for PDE Solving"
☆13Jun 17, 2024Updated 2 years ago
cambridgeltl / multi3woz
View on GitHub
The official repository for Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapte…
☆17Jan 15, 2024Updated 2 years ago
Alsace08 / SumCoT
View on GitHub
[ACL 2023] Code and Data Repo for Paper "Element-aware Summary and Summary Chain-of-Thought (SumCoT)"
☆54Jan 21, 2024Updated 2 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
shadowkiller33 / ParaScore
View on GitHub
☆31Apr 14, 2023Updated 3 years ago
WHU-ZQH / ChatGPT-vs.-BERT
View on GitHub
🎁[ChatGPT4NLU] A Comparative Study on ChatGPT and Fine-tuned BERT
☆191Apr 17, 2023Updated 3 years ago
hitz-zentroa / lm-contamination
View on GitHub
The LM Contamination Index is a manually created database of contamination evidences for LMs.
☆81Apr 11, 2024Updated 2 years ago
gentaiscool / few-shot-lm
View on GitHub
The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)
☆52Jun 12, 2022Updated 4 years ago
OFA-Sys / OFA-Compress
View on GitHub
OFA-Compress is a unified framework which provides OFA model finetuning, distillation and inference capabilities in Huggingface version, …
☆29Sep 22, 2022Updated 3 years ago
HanNight / AdaCAD
View on GitHub
Code for NAACL 2025 paper "AdaCAD: Adaptively Decoding to Balance Conflicts between Contextual and Parametric Knowledge"
☆16Mar 2, 2026Updated 4 months ago
ou-medinfo / medbertjp
View on GitHub
Trials of pre-trained BERT models for the medical domain in Japanese.
☆13Nov 21, 2020Updated 5 years ago
zjwang21 / StrokeNet
View on GitHub
The official code for our EMNLP 2022 long paper [Breaking the Representation Bottleneck of Chinese Characters: Neural Machine Translation…
☆27Sep 10, 2025Updated 10 months ago
HypherX / Evolution-Analysis
View on GitHub
☆25Dec 13, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
cordercorder / knn-models
View on GitHub
A retrieval augmented sequence modeling toolkit implemented based on Fairseq
☆29Mar 3, 2023Updated 3 years ago
THU-KEG / EvaluationPapers4ChatGPT
View on GitHub
Resource, Evaluation and Detection Papers for ChatGPT
☆456Mar 21, 2024Updated 2 years ago
Aunsiels / CSK
View on GitHub
Code for generating Quasimodo, a commonsense knowledge base.
☆20Sep 14, 2021Updated 4 years ago
gmum / dl-mo-2021
View on GitHub
Deep Learning with Multiple Objectives: 2021 edition
☆10May 27, 2021Updated 5 years ago
CLARIN-PL / chatgpt-evaluation-01-2023
View on GitHub
Code, datasets and results of the ChatGPT evaluation presented in paper "ChatGPT: Jack of all trades, master of none"
☆29Mar 7, 2023Updated 3 years ago
allenai / everyday-things
View on GitHub
☆17Dec 6, 2023Updated 2 years ago
ictnlp / FA-DAT
View on GitHub
Official Implementation for the ICLR2023 paper "Fuzzy Alignments in Directed Acyclic Graph for Non-autoregressive Machine Translation"
☆14Mar 1, 2023Updated 3 years ago