Khan / tutoring-accuracy-dataset
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆28Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for tutoring-accuracy-dataset
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆77Updated 3 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆29Updated 3 months ago
- ☆26Updated 3 weeks ago
- ☆29Updated last year
- Retrieval Augmented Generation Generalized Evaluation Dataset☆51Updated last month
- ☆20Updated last year
- ☆86Updated 5 months ago
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆44Updated 8 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆122Updated 7 months ago
- Evaluating LLMs with fewer examples☆134Updated 7 months ago
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆29Updated 8 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated last month
- Dataset and annotations for ASSETS 2022 publication☆10Updated 2 years ago
- The Synthetic-Persona-Chat dataset is a synthetically generated persona-based dialogue dataset. It extends the original Persona-Chat data…☆76Updated 10 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆62Updated last year
- Functional Benchmarks and the Reasoning Gap☆78Updated last month
- 👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"☆20Updated 9 months ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- A set of utilities for running few-shot prompting experiments on large-language models☆112Updated last year
- ☆43Updated last month
- ☆94Updated 6 months ago
- The Prism Alignment Project☆37Updated 6 months ago
- Discovering Data-driven Hypotheses in the Wild☆39Updated 2 weeks ago
- Code and data for the paper "Measuring Conversational Uptake: A Case-Study on Student-Teacher Interactions"☆21Updated 2 years ago
- The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice De…☆13Updated 7 months ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆51Updated 5 months ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learners☆111Updated 2 months ago
- ☆31Updated last month
- ☆190Updated 2 months ago
- Retrieval-Augmented Generation battle!☆44Updated last month