kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ24Updated 2 weeks ago
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- Resources for cultural NLP researchโ113Updated 3 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ72Updated 3 months ago
- โ189Updated 6 months ago
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]โ16Updated last year
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Modelsโ47Updated last year
- Codebase, data and models for the SummaC paper in TACLโ106Updated 11 months ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomicโฆโ414Updated 8 months ago
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.โ153Updated 4 months ago
- โ47Updated 3 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"โ81Updated last year
- Multilingual Large Language Models Evaluation Benchmarkโ133Updated last year
- Companion code for FanOutQA: Multi-Hop, Multi-Document Question Answering for Large Language Models (ACL 2024)โ58Updated 3 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"โ215Updated last year
- NAACL 2021: Are NLP Models really able to Solve Simple Math Word Problems?โ136Updated 3 years ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maskeโฆโ127Updated last year
- โ89Updated last year
- Data for evaluating gender bias in coreference resolution systems.โ81Updated 6 years ago
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ45Updated last year
- UnQovering Stereotyping Biases via Underspecified Questions - EMNLP 2020 (Findings)โ21Updated 4 years ago
- Awesome LLM for NLG Evaluation Papersโ25Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ136Updated last year
- Source Code of Paper "GPTScore: Evaluate as You Desire"โ257Updated 2 years ago
- A curated list of research papers and resources on Cultural LLM.โ52Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasetsโ225Updated last year
- Repository for the Bias Benchmark for QA dataset.โ133Updated last year
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).โ157Updated 3 months ago
- โ116Updated last year
- Code and data associated with the AmbiEnt dataset in "We're Afraid Language Models Aren't Modeling Ambiguity" (Liu et al., 2023)โ64Updated last year
- ACL2023 - AlignScore, a metric for factual consistency evaluation.โ148Updated last year
- Repository for EMNLP 2022 Paper: Towards a Unified Multi-Dimensional Evaluator for Text Generationโ214Updated last year