kaushal0494 / UnifyingAITutorEvaluation
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ11Updated 2 weeks ago
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ52Updated 2 months ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ39Updated 9 months ago
- Codebase, data and models for the SummaC paper in TACLโ93Updated 3 months ago
- โ21Updated 3 years ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- code associated with ACL 2021 DExperts paperโ115Updated last year
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ15Updated last year
- FRANK: Factuality Evaluation Benchmarkโ55Updated 2 years ago
- This repository contains a dataset containing โ2K dialogues whose listener utterances are annotated from labels derived from the Motiva-โฆโ16Updated 2 years ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"โ30Updated last month
- First explanation metric (diagnostic report) for text generation evaluationโ61Updated 2 months ago
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"โ24Updated 3 weeks ago
- โ11Updated 3 years ago
- โ97Updated last year
- โ92Updated 2 years ago
- Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"โ120Updated last year
- Code and test data for "On Measuring Bias in Sentence Encoders", to appear at NAACL 2019.โ54Updated 3 years ago
- โ48Updated 2 years ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ83Updated 9 months ago
- โ27Updated 2 years ago
- โ48Updated 2 years ago
- โ133Updated 4 months ago
- A Large-Scale Dataset for Empathetic Response Generationโ41Updated last year
- Can Large Language Models Be an Alternative to Human Evaluations?โ9Updated last year
- Data for evaluating gender bias in coreference resolution systems.โ77Updated 6 years ago
- โ20Updated 5 months ago
- Awesome LLM for NLG Evaluation Papersโ24Updated last year
- โ19Updated last year
- โ71Updated 3 years ago
- DSTC11 Track 5 - Task-oriented Conversational Modeling with Subjective Knowledgeโ45Updated last year