kaushal0494 / UnifyingAITutorEvaluationLinks
An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
☆21Updated last month
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below
Sorting:
- ☆30Updated 10 months ago
- Resources for cultural NLP research☆105Updated last month
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆67Updated last month
- The implementation of "RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question" [ACL 2023]☆16Updated last year
- ☆189Updated 4 months ago
- Codebase, data and models for the SummaC paper in TACL☆102Updated 9 months ago
- This repository contains the data and code introduced in the paper "CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Maske…☆125Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆132Updated last year
- ACL 2022: An Empirical Survey of the Effectiveness of Debiasing Techniques for Pre-trained Language Models.☆149Updated 2 months ago
- SemEval2024-task8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection☆77Updated last year
- Repository for the Bias Benchmark for QA dataset.☆129Updated last year
- NAACL 2024: SeaEval for Multilingual Foundation Models: From Cross-Lingual Alignment to Cultural Reasoning☆26Updated 8 months ago
- An Apache 2.0 fork of HuggingFace's Large Language Model Text Generation Inference☆19Updated last year
- The geometry of multilingual language model representations (EMNLP 2022).☆22Updated 3 years ago
- ☆47Updated last month
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆397Updated 6 months ago
- Awesome LLM for NLG Evaluation Papers☆25Updated last year
- This repository contains the dataset and code for "WiCE: Real-World Entailment for Claims in Wikipedia" in EMNLP 2023.☆41Updated last year
- Crosslingual Reasoning through Test-Time Scaling☆19Updated 5 months ago
- Token-level Reference-free Hallucination Detection☆96Updated 2 years ago
- Dataset associated with "BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation" paper☆81Updated 4 years ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆138Updated last year
- ☆15Updated 3 years ago
- ☆101Updated last year
- ☆23Updated 4 years ago
- Easy-to-use framework for evaluating cross-lingual consistency of factual knowledge (Supported LLaMA, BLOOM, mT5, RoBERTa, etc.) Paper he…☆26Updated 2 months ago
- ACL 2023: Evaluating Open-Domain Question Answering in the Era of Large Language Models☆47Updated last year
- Data and info for the paper "ParaDetox: Text Detoxification with Parallel Data"☆32Updated 7 months ago
- ☆85Updated 10 months ago
- ACL 2023 paper "A Critical Evaluation of Evaluations for Long-form Question Answering"☆21Updated last year