kaushal0494 / UnifyingAITutorEvaluation
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ9Updated 2 weeks ago
Alternatives and similar repositories for UnifyingAITutorEvaluation:
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ51Updated last month
- โ21Updated 3 years ago
- Code for the paper "REV: Information-Theoretic Evaluation of Free-Text Rationales"โ15Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ128Updated last year
- Resources for cultural NLP researchโ92Updated this week
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ36Updated 9 months ago
- Codebase, data and models for the SummaC paper in TACLโ91Updated 2 months ago
- Code and test data for "On Measuring Bias in Sentence Encoders", to appear at NAACL 2019.โ54Updated 3 years ago
- DSTC11 Track 5 - Task-oriented Conversational Modeling with Subjective Knowledgeโ45Updated last year
- code associated with ACL 2021 DExperts paperโ114Updated last year
- โ15Updated 2 years ago
- โ20Updated 4 months ago
- Code and data for Marked Personas (ACL 2023)โ23Updated last year
- Faithfulness and factuality annotations of XSum summaries from our paper "On Faithfulness and Factuality in Abstractive Summarization" (hโฆโ81Updated 4 years ago
- FRANK: Factuality Evaluation Benchmarkโ55Updated 2 years ago
- UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)โ22Updated 3 years ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ137Updated 4 months ago
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"โ24Updated 2 years ago
- The official repo for SocKET: Social Knowledge Evaluation Testsโ23Updated last year
- Detect hallucinated tokens for conditional sequence generation.โ64Updated 3 years ago
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.โ41Updated last year
- Can Large Language Models Be an Alternative to Human Evaluations?โ9Updated last year
- The official code of TACL 2021, "Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies".โ68Updated 2 years ago
- โ46Updated 2 years ago
- Code and dataset for the paper: Generating Literal and Implied Subquestions to Fact-check Complex Claimsโ26Updated last year
- Token-level Reference-free Hallucination Detectionโ94Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ83Updated 8 months ago
- This repository contains a dataset containing โ2K dialogues whose listener utterances are annotated from labels derived from the Motiva-โฆโ16Updated 2 years ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".โ78Updated last month