kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ17Updated 2 months ago
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ58Updated 5 months ago
- Resources for cultural NLP researchโ101Updated 3 months ago
- Codebase, data and models for the SummaC paper in TACLโ98Updated 6 months ago
- โ22Updated 4 years ago
- Crosslingual Reasoning through Test-Time Scalingโ18Updated 2 months ago
- โ23Updated 8 months ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- โ45Updated last year
- Multilingual Large Language Models Evaluation Benchmarkโ128Updated 11 months ago
- Repository for the Bias Benchmark for QA dataset.โ124Updated last year
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ43Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicโฆโ366Updated 3 months ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maskeโฆโ122Updated last year
- An Apache 2.0 fork of HuggingFace's Large Language Model Text Generation Inferenceโ20Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Modelsโ48Updated 2 years ago
- FRANK: Factuality Evaluation Benchmarkโ57Updated 2 years ago
- โ183Updated last month
- A comprehensive paper list of Reasoning over Tables.โ28Updated 2 years ago
- โ22Updated last year
- โ28Updated 3 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.โ136Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ143Updated 7 months ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"โ254Updated 2 years ago
- Resources for the "Evaluating the Factual Consistency of Abstractive Text Summarization" paperโ303Updated 3 months ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"โ31Updated 4 months ago
- Data for evaluating gender bias in coreference resolution systems.โ79Updated 6 years ago
- Awesome LLM for NLG Evaluation Papersโ24Updated last year
- This repository contains a dataset containing โ2K dialogues whose listener utterances are annotated from labels derived from the Motiva-โฆโ17Updated 2 years ago
- โ61Updated 8 months ago
- โ15Updated 2 months ago