Khan / tutoring-accuracy-datasetLinks
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆51Updated last year
Alternatives and similar repositories for tutoring-accuracy-dataset
Users that are interested in tutoring-accuracy-dataset are comparing it to the libraries listed below
Sorting:
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆99Updated 4 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆43Updated last year
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆59Updated 5 months ago
- ☆33Updated 2 years ago
- ☆95Updated last year
- Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors☆20Updated 3 months ago
- ☆247Updated 5 months ago
- potato: portable text annotation tool☆349Updated last month
- Ghostbuster: Detecting Text Ghostwritten by Large Language Models (NAACL 2024)☆162Updated last year
- Fact-Checking the Output of Generative Large Language Models in both Annotation and Evaluation.☆104Updated last year
- ☆48Updated 9 months ago
- ☆114Updated last year
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆132Updated last year
- ☆24Updated 2 years ago
- Bayesian IRT models in Python☆150Updated 3 weeks ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆40Updated last year
- ☆267Updated 7 months ago
- ☆293Updated last year
- Data for evaluating gender bias in coreference resolution systems.☆80Updated 6 years ago
- An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors☆18Updated 2 weeks ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆509Updated last year
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆24Updated 4 months ago
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)☆369Updated last week
- Code and data for "Measuring and Narrowing the Compositionality Gap in Language Models"☆318Updated last year
- paper list on reasoning in NLP☆191Updated 4 months ago
- The Synthetic-Persona-Chat dataset is a synthetically generated persona-based dialogue dataset. It extends the original Persona-Chat data…☆98Updated last year
- The dataset and code for paper: TheoremQA: A Theorem-driven Question Answering dataset☆159Updated last year
- A data set based on all arXiv publications, pre-processed for NLP, including structured full-text and citation network☆293Updated 11 months ago
- What's In My Big Data (WIMBD) - a toolkit for analyzing large text datasets☆223Updated 9 months ago
- ☆22Updated 4 years ago