Spico197 / awesome-lm-evaluation
π©Ί A collection of ChatGPT evaluation reports on various bechmarks.
β49Updated 2 years ago
Alternatives and similar repositories for awesome-lm-evaluation
Users that are interested in awesome-lm-evaluation are comparing it to the libraries listed below
Sorting:
- PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogβ¦β27Updated 3 years ago
- The code implementation of the EMNLP2022 paper: DisCup: Discriminator Cooperative Unlikelihood Prompt-tuning for Controllable Text Geneβ¦β26Updated last year
- First explanation metric (diagnostic report) for text generation evaluationβ61Updated 2 months ago
- Technical Report: Is ChatGPT a Good NLG Evaluator? A Preliminary Studyβ43Updated 2 years ago
- [Findings of ACL'2023] Improving Contrastive Learning of Sentence Embeddings from AI Feedbackβ39Updated last year
- β14Updated 2 years ago
- β16Updated 2 months ago
- Calculate the probability of a paper being accepted by EMNLP2023 based on score distribution of ACL2023.β14Updated last year
- Resources for our ACL 2023 paper: Distilling Script Knowledge from Large Language Models for Constrained Language Planningβ36Updated last year
- β71Updated 2 years ago
- β42Updated last year
- βοΈ ChatGPT as a writing partner.β14Updated 2 years ago
- β32Updated last year
- Code for M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Modelsβ23Updated 9 months ago
- Official code for "Continual Prompt Tuning for Dialog State Tracking" (ACL 2022).β27Updated 2 years ago
- This project maintains a reading list for general text generation tasksβ65Updated 3 years ago
- πΌ Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Expertsβ38Updated 7 months ago
- Code and data for the paper "Can Large Language Models Understand Real-World Complex Instructions?"(AAAI2024)β48Updated last year
- [ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answeringβ45Updated 2 years ago
- Dataset and baseline for Coling 2022 long paper (oral): "ConFiguRe: Exploring Discourse-level Chinese Figures of Speech"β11Updated last year
- Implementation of "Investigating the Factual Knowledge Boundary of Large Language Models with Retrieval Augmentation"β80Updated last year
- code for Teaching LM to Translate with Comparisonβ39Updated last year
- Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Lβ¦β17Updated last year
- [ICLR24] The open-source repo of THU-KEG's KoLA benchmark.β50Updated last year
- UNISUMM: Unified Few-shot Summarization with Multi-Task Pre-Training and Prefix-Tuningβ60Updated last year
- β61Updated 2 years ago
- β21Updated 2 years ago
- A toolkit for evaluation of natural language generation (NLG), including BLEU, ROUGE, METEOR, and CIDEr.β31Updated 4 years ago
- Target-oriented Proactive Dialogue Systems with Personalization: Problem Formulation and Dataset Curation (EMNLP 2023)β30Updated last year
- Implementation of "The Power of Scale for Parameter-Efficient Prompt Tuning"β57Updated 2 years ago