kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ18Updated last month
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- Resources for cultural NLP researchโ103Updated 4 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ63Updated this week
- Codebase, data and models for the SummaC paper in TACLโ102Updated 7 months ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maskeโฆโ124Updated last year
- SemEval2024-task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detectionโ76Updated last year
- Code and data for Marked Personas (ACL 2023)โ28Updated 2 years ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"โ32Updated 5 months ago
- โ23Updated last year
- Repository for the Bias Benchmark for QA dataset.โ128Updated last year
- Multilingual Large Language Models Evaluation Benchmarkโ131Updated last year
- โ30Updated 9 months ago
- โ186Updated 2 months ago
- โ28Updated 3 years ago
- A reading list of up-to-date papers on NLP for Social Good.โ304Updated 2 years ago
- โ27Updated 2 years ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ146Updated last month
- This is a repository for sharing papers in the field of persona-based conversational AI. The related source code for each paper is linkedโฆโ167Updated last year
- Crosslingual Reasoning through Test-Time Scalingโ19Updated 4 months ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ43Updated last year
- This repository contains a dataset containing โ2K dialogues whose listener utterances are annotated from labels derived from the Motiva-โฆโ18Updated 2 years ago
- โ45Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicโฆโ380Updated 5 months ago
- FRANK: Factuality Evaluation Benchmarkโ59Updated 2 years ago
- Data for evaluating gender bias in coreference resolution systems.โ80Updated 6 years ago
- Find informative examples to efficiently (human)-evaluate NLG models.โ16Updated last month
- The geometry of multilingual language model representations (EMNLP 2022).โ21Updated 2 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.โ138Updated last year
- โ84Updated 9 months ago
- An Apache 2.0 fork of HuggingFace's Large Language Model Text Generation Inferenceโ20Updated last year