Khan / tutoring-accuracy-datasetLinks
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆54Updated last year
Alternatives and similar repositories for tutoring-accuracy-dataset
Users that are interested in tutoring-accuracy-dataset are comparing it to the libraries listed below
Sorting:
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆68Updated 2 months ago
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆101Updated 7 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆44Updated last year
- ☆35Updated 2 years ago
- ☆24Updated 2 years ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆40Updated last year
- ☆100Updated last year
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors☆21Updated last month
- Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025☆24Updated last month
- Multilingual Large Language Models Evaluation Benchmark☆133Updated last year
- ☆116Updated last year
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"☆112Updated 2 years ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆211Updated 11 months ago
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆65Updated 2 years ago
- The Synthetic-Persona-Chat dataset is a synthetically generated persona-based dialogue dataset. It extends the original Persona-Chat data…☆104Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆223Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆135Updated last year
- A large-scale information-rich web dataset, featuring millions of real clicked query-document labels☆345Updated 11 months ago
- ☆45Updated 2 years ago
- ☆256Updated 7 months ago
- ☆270Updated 9 months ago
- The official repo for SocKET: Social Knowledge Evaluation Tests☆24Updated 6 months ago
- A curated list of research papers and resources on Cultural LLM.☆52Updated last year
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆24Updated 6 months ago
- ☆23Updated 4 years ago
- The Prism Alignment Project☆86Updated last year
- ACL2023 - AlignScore, a metric for factual consistency evaluation.☆143Updated last year
- Repository for MuSiQue: Multi-hop Questions via Single-hop Question Composition, TACL 2022☆177Updated last year
- ☆46Updated last year
- Official repository for the AnnoMI dataset: the first public collection of expert-annotated MI transcripts.☆78Updated 2 years ago