Khan / tutoring-accuracy-datasetLinks
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆48Updated 10 months ago
Alternatives and similar repositories for tutoring-accuracy-dataset
Users that are interested in tutoring-accuracy-dataset are comparing it to the libraries listed below
Sorting:
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆97Updated 2 months ago
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆55Updated 4 months ago
- ☆33Updated 2 years ago
- Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors☆17Updated 2 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆41Updated 11 months ago
- potato: portable text annotation tool☆339Updated last week
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors☆14Updated last month
- ☆24Updated 2 years ago
- ☆239Updated 3 months ago
- ☆22Updated 4 years ago
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆24Updated 2 months ago
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆63Updated last year
- ☆110Updated last year
- Repository for the Bias Benchmark for QA dataset.☆123Updated last year
- Ghostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)☆160Updated last year
- This is the data associated with the PERSUADE Corpus 2.0 version☆43Updated 8 months ago
- The Prism Alignment Project☆79Updated last year
- The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice De…☆18Updated last year
- ☆44Updated 7 months ago
- Multilingual Large Language Models Evaluation Benchmark☆127Updated 10 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆37Updated last year
- The official repo for SocKET: Social Knowledge Evaluation Tests☆23Updated 2 months ago
- ☆94Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆131Updated last year
- ☆215Updated 4 years ago
- ☆268Updated 5 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆221Updated 8 months ago
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆132Updated last year
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆102Updated last year
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆361Updated last year