kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ20Updated last week
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ65Updated 3 weeks ago
- Resources for cultural NLP researchโ103Updated 2 weeks ago
- Multilingual Large Language Models Evaluation Benchmarkโ132Updated last year
- โ30Updated 10 months ago
- Codebase, data and models for the SummaC paper in TACLโ102Updated 8 months ago
- โ24Updated last year
- An Apache 2.0 fork of HuggingFace's Large Language Model Text Generation Inferenceโ19Updated last year
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maskeโฆโ124Updated last year
- โ46Updated 2 weeks ago
- โ19Updated last year
- Awesome LLM for NLG Evaluation Papersโ25Updated last year
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoningโ26Updated 7 months ago
- FRANK: Factuality Evaluation Benchmarkโ59Updated 2 years ago
- Dataset for NAACL 2021 paper: "QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization"โ137Updated 2 years ago
- A curated list of research papers and resources on Cultural LLM.โ50Updated last year
- โ13Updated last year
- Token-level Reference-free Hallucination Detectionโ96Updated 2 years ago
- โ28Updated 3 years ago
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"โ32Updated 6 months ago
- โ189Updated 3 months ago
- This is a repository for sharing papers in the field of persona-based conversational AI. The related source code for each paper is linkedโฆโ166Updated last year
- โ84Updated 9 months ago
- โ18Updated 7 months ago
- Crosslingual Reasoning through Test-Time Scalingโ19Updated 5 months ago
- Code for Multilingual Eval of Generative AI paper published at EMNLP 2023โ70Updated last year
- Repository for the Bias Benchmark for QA dataset.โ128Updated last year
- Code for the paper "HALoGEN: Fantastic LLM Hallucinations and Where To Find Them"โ21Updated 4 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ148Updated last month
- โ15Updated 2 years ago