Khan / tutoring-accuracy-datasetLinks
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆52Updated last year
Alternatives and similar repositories for tutoring-accuracy-dataset
Users that are interested in tutoring-accuracy-dataset are comparing it to the libraries listed below
Sorting:
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆100Updated 5 months ago
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆65Updated 3 weeks ago
- ☆34Updated 2 years ago
- Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025☆22Updated last week
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆43Updated last year
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆64Updated 2 years ago
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆24Updated 5 months ago
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"☆113Updated 2 years ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆40Updated last year
- ☆24Updated 2 years ago
- ☆253Updated 6 months ago
- ☆99Updated last year
- potato: portable text annotation tool☆352Updated 2 weeks ago
- ☆115Updated last year
- Official repository for the AnnoMI dataset: the first public collection of expert-annotated MI transcripts.☆77Updated 2 years ago
- This is the data associated with the PERSUADE Corpus 2.0 version☆46Updated 10 months ago
- Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.☆20Updated 6 months ago
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors☆20Updated last week
- ☆12Updated 2 years ago
- Repository for the Bias Benchmark for QA dataset.☆128Updated last year
- The Synthetic-Persona-Chat dataset is a synthetically generated persona-based dialogue dataset. It extends the original Persona-Chat data…☆101Updated last year
- Ghostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)☆162Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆131Updated last year
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆105Updated last year
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆295Updated last year
- Code and data for Marked Personas (ACL 2023)☆28Updated 2 years ago
- This is the code for our KILT leaderboard submissions (KGI + Re2G models).☆157Updated 3 weeks ago
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆381Updated last year
- The official repo for SocKET: Social Knowledge Evaluation Tests☆24Updated 4 months ago
- Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022☆168Updated last year