eth-lre / mathtutorbenchLinks
Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025
☆21Updated last week
Alternatives and similar repositories for mathtutorbench
Users that are interested in mathtutorbench are comparing it to the libraries listed below
Sorting:
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆61Updated 6 months ago
- This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the p…☆52Updated last year
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆380Updated 5 months ago
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆511Updated last year
- MAD: The first work to explore Multi-Agent Debate with Large Language Models :D☆434Updated 8 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆43Updated last year
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"☆359Updated last year
- Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23☆233Updated last year
- Prod Env☆429Updated last year
- ☆287Updated last year
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.☆547Updated 10 months ago
- Codes for papers on Large Language Models Personalization (LaMP)☆169Updated 7 months ago
- Source Code of Paper "GPTScore: Evaluate as You Desire"☆256Updated 2 years ago
- ☆99Updated 11 months ago
- [NeurIPS 2023] Codebase for the paper: "Guiding Large Language Models with Directional Stimulus Prompting"☆113Updated 2 years ago
- ☆34Updated 2 years ago
- Forward-Looking Active REtrieval-augmented generation (FLARE)☆651Updated last year
- LLMs can generate feedback on their work, use it to improve the output, and repeat this process iteratively.☆739Updated 11 months ago
- Sotopia: an Open-ended Social Learning Environment (ICLR 2024 spotlight)☆244Updated last month
- Data and code for FreshLLMs (https://arxiv.org/abs/2310.03214)☆372Updated this week
- This is a repository for sharing papers in the field of persona-based conversational AI. The related source code for each paper is linked…☆167Updated last year
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆513Updated 8 months ago
- RewardBench: the first evaluation tool for reward models.☆634Updated 3 months ago
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆548Updated 7 months ago
- This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.☆510Updated last year
- ☆280Updated 8 months ago
- The awesome agents in the era of large language models☆69Updated last year
- A curated list of Human Preference Datasets for LLM fine-tuning, RLHF, and eval.☆378Updated last year
- ConvGQR: Generative Query Reformulation for Conversational Search. A codebase for ACL 2023 accepted paper.☆32Updated last year
- ☆36Updated 2 years ago