kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ12Updated last month
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ54Updated 3 months ago
- โ20Updated 6 months ago
- Resources for cultural NLP researchโ96Updated last month
- FRANK: Factuality Evaluation Benchmarkโ55Updated 2 years ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- Resources for paper "DialSummEval: Revisiting summarization evaluation for dialogues"โ15Updated 2 years ago
- Codebase, data and models for the SummaC paper in TACLโ94Updated 4 months ago
- โ21Updated 3 years ago
- โ11Updated 3 years ago
- Can Large Language Models Be an Alternative to Human Evaluations?โ9Updated last year
- โ82Updated 2 years ago
- โ40Updated last year
- โ98Updated last year
- โ92Updated 3 years ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"โ30Updated 2 months ago
- Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (hโฆโ82Updated 4 years ago
- An original implementation of the paper "CREPE: Open-Domain Question Answering with False Presuppositions"โ16Updated 7 months ago
- โ26Updated 2 years ago
- This repository contains the two datasets introduced in the paper "Making Science Simple: Corpora for the Lay Summarisation of Scientificโฆโ25Updated last year
- โ59Updated 6 months ago
- Token-level Reference-free Hallucination Detectionโ94Updated last year
- The dataset and code for PeerSum at EMNLP'23.โ14Updated last year
- โ62Updated 2 years ago
- Code and model checkpoints for the MultiVerS model for scientific claim verification.โ45Updated last year
- First explanation metric (diagnostic report) for text generation evaluationโ62Updated 3 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ84Updated 9 months ago
- A repository with several curated datasets of counter-narratives to fight online hate speech.โ89Updated 2 years ago
- Resources for the shared task on conversational question answering SCAI-QReCC 2021โ29Updated 2 years ago
- Code and data for Marked Personas (ACL 2023)โ24Updated 2 years ago
- Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paperโ297Updated last month