eth-lre / mathtutorbenchLinks
Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025
☆22Updated last week
Alternatives and similar repositories for mathtutorbench
Users that are interested in mathtutorbench are comparing it to the libraries listed below
Sorting:
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆65Updated 3 weeks ago
- This repository hosts the paper “LLM Based Math Tutoring: Challenges and Dataset”, along with the accompanying dataset. It explores the p…☆52Updated last year
- ☆100Updated last year
- Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them☆514Updated last year
- Official implementation for the paper "DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models"☆518Updated 8 months ago
- A package to evaluate factuality of long-form generation. Original implementation of our EMNLP 2023 paper "FActScore: Fine-grained Atomic…☆386Updated 5 months ago
- This is a collection of research papers for Self-Correcting Large Language Models with Automated Feedback.☆552Updated 11 months ago
- Data and Code for Program of Thoughts [TMLR 2023]☆287Updated last year
- RewardBench: the first evaluation tool for reward models.☆639Updated 4 months ago
- Awesome LLM Self-Consistency: a curated list of Self-consistency in Large Language Models☆109Updated 2 months ago
- This is the repository of HaluEval, a large-scale hallucination evaluation benchmark for Large Language Models.☆514Updated last year
- MAD: The first work to explore Multi-Agent Debate with Large Language Models :D☆441Updated 8 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆43Updated last year
- ☆610Updated 2 months ago
- The awesome agents in the era of large language models☆69Updated last year
- ☆436Updated 2 months ago
- Repository for Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL23☆235Updated last year
- Kim, J., Evans, J., & Schein, A. (2025). Linear Representations of Political Perspective Emerge in Large Language Models. ICLR.☆20Updated 6 months ago
- LLM hallucination paper list☆323Updated last year
- Codes for papers on Large Language Models Personalization (LaMP)☆170Updated 7 months ago
- Code and data for "Lost in the Middle: How Language Models Use Long Contexts"☆359Updated last year
- Inference-Time Intervention: Eliciting Truthful Answers from a Language Model☆549Updated 8 months ago
- A new tool learning benchmark aiming at well-balanced stability and reality, based on ToolBench.☆185Updated 5 months ago
- RECOMP: Improving Retrieval-Augmented LMs with Compression and Selective Augmentation.☆140Updated 4 months ago
- Github repository for "RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models"☆203Updated 10 months ago
- Official Repo for ICLR 2024 paper MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback by Xingyao Wang*, Ziha…☆130Updated last year
- Repository for the Bias Benchmark for QA dataset.☆128Updated last year
- Awesome papers for role-playing with language models☆205Updated 11 months ago
- The papers are organized according to our survey: Evaluating Large Language Models: A Comprehensive Survey.☆780Updated last year
- A curated list of LLM Interpretability related material - Tutorial, Library, Survey, Paper, Blog, etc..☆273Updated 6 months ago