Khan / tutoring-accuracy-dataset
This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the performance and challenges of Large Language Models (LLMs) in math tutoring scenarios, providing a benchmark dataset for evaluating LLM accuracy in educational contexts.
☆35Updated 4 months ago
Alternatives and similar repositories for tutoring-accuracy-dataset:
Users that are interested in tutoring-accuracy-dataset are comparing it to the libraries listed below
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆82Updated 5 months ago
- ☆90Updated 7 months ago
- ☆31Updated 3 months ago
- ☆10Updated last year
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆31Updated 6 months ago
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆45Updated 10 months ago
- ☆32Updated last year
- A Computational Framework for Behavioral Assessment of LLM Therapists☆24Updated 3 months ago
- Data for evaluating gender bias in coreference resolution systems.☆72Updated 5 years ago
- Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM (CHI 2024 paper). LLooM automatically surfaces high-l…☆71Updated last month
- ☆100Updated 8 months ago
- ☆21Updated last year
- Code for "From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Mod…☆31Updated 10 months ago
- ☆206Updated last week
- A corpus and code for understanding norms and subjectivity. 🤖☆45Updated 3 months ago
- ☆22Updated 10 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆124Updated 10 months ago
- Codes and Datasets for our ACL 2023 paper on cognitive reframing of negative thoughts☆56Updated last year
- ☆56Updated 3 months ago
- Code and data accompanying the paper "TRUE: Re-evaluating Factual Consistency Evaluation".☆73Updated 2 months ago
- The AI Knowledge Editor☆182Updated 2 years ago
- SeeGULL is a broad-coverage stereotype dataset in English containing stereotypes about identity groups spanning 178 countries across 8 di…☆33Updated last year
- Resources for cultural NLP research☆77Updated 2 months ago
- The codebase for our ACL2023 paper: Did You Read the Instructions? Rethinking the Effectiveness of Task Definitions in Instruction Learni…☆29Updated last year
- Code of ICLR paper: https://openreview.net/forum?id=-cqvvvb-NkI☆91Updated last year
- The Prism Alignment Project☆62Updated 8 months ago
- 👩💻 Code for the ACL paper "Detecting Edit Failures in LLMs: An Improved Specificity Benchmark"☆20Updated last year
- 🌾 Universal, customizable and deployable fine-grained evaluation for text generation.☆22Updated last year
- Generating claims for zero-shot scientific fact checking☆29Updated 2 years ago
- An Education Tutoring Chatbot based on Learning Science Principles powered by Large Language Models☆47Updated 2 months ago