kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ23Updated 2 months ago
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- Resources for cultural NLP researchโ110Updated 2 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ70Updated 2 months ago
- โ47Updated 2 months ago
- โ189Updated 5 months ago
- Multilingual Large Language Models Evaluation Benchmarkโ133Updated last year
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ45Updated last year
- A curated list of research papers and resources on Cultural LLM.โ52Updated last year
- The geometry of multilingual language model representations (EMNLP 2022).โ22Updated 3 years ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicโฆโ410Updated 8 months ago
- First explanation metric (diagnostic report) for text generation evaluationโ62Updated 9 months ago
- Crosslingual Reasoning through Test-Time Scalingโ19Updated 7 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ152Updated 3 months ago
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Modelsโ47Updated last year
- โ19Updated 9 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ86Updated last year
- RARR: Researching and Revising What Language Models Say, Using Language Modelsโ49Updated 2 years ago
- โ19Updated last year
- A comprehensive paper list of Reasoning over Tables.โ30Updated 3 years ago
- Token-level Reference-free Hallucination Detectionโ97Updated 2 years ago
- โ89Updated 11 months ago
- Repository for the Bias Benchmark for QA dataset.โ133Updated last year
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoningโ26Updated 9 months ago
- Codebase, data and models for the SummaC paper in TACLโ105Updated 10 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ135Updated last year
- UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)โ21Updated 4 years ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"โ215Updated last year
- โ82Updated 2 years ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"โ257Updated 2 years ago
- Awesome LLM for NLG Evaluation Papersโ25Updated last year