Khan / tutoring-accuracy-datasetLinks
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆54Updated last year
Alternatives and similar repositories for tutoring-accuracy-dataset
Users that are interested in tutoring-accuracy-dataset are comparing it to the libraries listed below
Sorting:
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆102Updated 7 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆45Updated last year
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆70Updated 2 months ago
- ☆35Updated 2 years ago
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors☆23Updated 2 months ago
- Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025 Oral☆26Updated 3 weeks ago
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆66Updated 2 years ago
- potato: portable text annotation tool☆356Updated last week
- ☆100Updated last year
- This repository contains the code and dataset for our paper titled Speaker and Time-aware Joint Contextual Learning for Dialogue-act Clas…☆52Updated 3 years ago
- Resources for cultural NLP research☆110Updated 2 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆40Updated last year
- Repository for the Bias Benchmark for QA dataset.☆133Updated last year
- Multilingual Large Language Models Evaluation Benchmark☆133Updated last year
- Official repository for the AnnoMI dataset: the first public collection of expert-annotated MI transcripts.☆80Updated 2 years ago
- Code for paper "G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment"☆397Updated last year
- ☆23Updated 4 years ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆135Updated last year
- Codes for papers on Large Language Models Personalization (LaMP)☆178Updated 9 months ago
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆24Updated 7 months ago
- ☆46Updated 2 years ago
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper…☆128Updated last year
- The Prism Alignment Project☆86Updated last year
- ☆116Updated last year
- This is the data associated with the PERSUADE Corpus 2.0 version☆47Updated last year
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆224Updated last year
- ☆270Updated 10 months ago
- ☆50Updated last year
- This repository contains a dataset containing ≈2K dialogues whose listener utterances are annotated from labels derived from the Motiva-…☆19Updated 2 years ago
- [EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627☆501Updated last year