kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
☆14Updated last month
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- The public data and evaluation scripts for the MLSP 2024 Shared Task☆9Updated 6 months ago
- Codebase, data and models for the SummaC paper in TACL☆97Updated 5 months ago
- Multilingual Large Language Models Evaluation Benchmark☆127Updated 10 months ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆122Updated last year
- This repository contains a dataset containing ≈2K dialogues whose listener utterances are annotated from labels derived from the Motiva-…☆17Updated 2 years ago
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆55Updated 4 months ago
- ☆23Updated 7 months ago
- Data for evaluating gender bias in coreference resolution systems.☆79Updated 6 years ago
- Resources for cultural NLP research☆98Updated 2 months ago
- Crosslingual Reasoning through Test-Time Scaling☆18Updated 2 months ago
- ☆22Updated 4 years ago
- ☆28Updated 3 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆132Updated last year
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆24Updated 2 months ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]☆16Updated last year
- code associated with ACL 2021 DExperts paper☆115Updated 2 years ago
- ☆110Updated last year
- FRANK: Factuality Evaluation Benchmark☆57Updated 2 years ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆252Updated 2 years ago
- ☆22Updated last year
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆41Updated 11 months ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆359Updated 3 months ago
- Repository for the Bias Benchmark for QA dataset.☆123Updated last year
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"☆31Updated 3 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆139Updated 7 months ago
- ☆182Updated 2 weeks ago
- Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paper☆299Updated 2 months ago
- First explanation metric (diagnostic report) for text generation evaluation☆62Updated 4 months ago
- A reading list of up-to-date papers on NLP for Social Good.☆304Updated last year
- Repository for research in the field of Responsible NLP at Meta.☆201Updated 2 months ago