CLARIN-PL / chatgpt-evaluation-01-2023
Code, datasets and results of the ChatGPT evaluation presented in paper "ChatGPT: Jack of all trades, master of none"
☆29Updated 2 years ago
Alternatives and similar repositories for chatgpt-evaluation-01-2023:
Users that are interested in chatgpt-evaluation-01-2023 are comparing it to the libraries listed below
- TBC☆26Updated 2 years ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Study☆43Updated 2 years ago
- Code for our paper: "GrIPS: Gradient-free, Edit-based Instruction Search for Prompting Large Language Models"☆53Updated 2 years ago
- Code for paper 'Data-Efficient FineTuning'☆29Updated last year
- This repository contains the code for "How many data points is a prompt worth?"☆48Updated 4 years ago
- Source code for SIGIR 2022 paper.☆15Updated 3 years ago
- Language Models of Code are Few-Shot Commonsense Learners (EMNLP 2022)☆86Updated 2 years ago
- ☆22Updated 2 years ago
- Official repository for our EACL 2023 paper "LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization" (https…☆43Updated 8 months ago
- The LM Contamination Index is a manually created database of contamination evidences for LMs.☆78Updated last year
- Code for the ACL2022 paper "Synthetic Question Value Estimation for Domain Adaptation of Question Answering"☆17Updated 3 years ago
- Code and pre-trained models for "ReasonBert: Pre-trained to Reason with Distant Supervision", EMNLP'2021☆29Updated 2 years ago
- Unifew: Unified Fewshot Learning Model☆18Updated 3 years ago
- ☆35Updated last year
- Code and dataset for the emnlp paper titled Instruct and Extract: Instruction Tuning for On-Demand Information Extraction☆51Updated last year
- A benchmark dataset for evaluating dialog system and natural language generation metrics.☆36Updated 2 years ago
- Code for the arXiv paper: "LLMs as Factual Reasoners: Insights from Existing Benchmarks and Beyond"☆59Updated 2 months ago
- Tasks for describing differences between text distributions.☆16Updated 8 months ago
- Repo for "On Learning to Summarize with Large Language Models as References"☆44Updated last year
- Data and code for the paper "The Moral Integrity Corpus: A Benchmark for Ethical Dialogue Systems"☆19Updated last year
- ☆12Updated 3 years ago
- Detect hallucinated tokens for conditional sequence generation.☆64Updated 3 years ago
- PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialog…☆27Updated 3 years ago
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆76Updated 2 years ago
- ☆31Updated last year
- DEMix Layers for Modular Language Modeling☆53Updated 3 years ago
- Data Valuation on In-Context Examples (ACL23)☆23Updated 3 months ago
- The official implemetation of "Evidentiality-guided Generation for Knowledge-Intensive NLP Tasks" (NAACL 2022).☆43Updated 2 years ago
- Code for paper "Do Language Models Have Beliefs? Methods for Detecting, Updating, and Visualizing Model Beliefs"☆28Updated 2 years ago
- [ACL 2023 Findings] What In-Context Learning “Learns” In-Context: Disentangling Task Recognition and Task Learning☆21Updated last year