An Evaluation Taxonomy for Pedagogical Ability Assessment of LLM-Powered AI Tutors
โ27Mar 2, 2026Updated last month
Alternatives and similar repositories for UnifyingAITutorEvaluation
Users that are interested in UnifyingAITutorEvaluation are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakeโฆโ45Jul 21, 2024Updated last year
- Benchmark for Measuring Open-ended Pedagogical Capabilities of LLM Tutors, EMNLP 2025 Oralโ34Nov 18, 2025Updated 4 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ80Sep 17, 2025Updated 6 months ago
- Official code and data repository of MathChat: MathChat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Inteโฆโ22Jun 3, 2024Updated last year
- Python coherence evaluation tool using Stanford's CoreNLP.โ10Feb 2, 2020Updated 6 years ago
- Deploy open-source AI quickly and easily - Bonus Offer โข AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- A repo to keep all resources about interpretability in NLP organised and up to dateโ12Nov 22, 2020Updated 5 years ago
- Improving word moverโs distance by leveraging self-attention matrix (Published in EMNLP 2023 Findings)โ10Mar 10, 2026Updated last month
- โ13Sep 27, 2022Updated 3 years ago
- โ12Feb 16, 2024Updated 2 years ago
- โ11Dec 11, 2022Updated 3 years ago
- A dataset of LLM-generated chain-of-thought steps annotated with mistake location.โ87Aug 10, 2024Updated last year
- The Universal Anaphora Scorerโ15Sep 2, 2024Updated last year
- Face-preprocess-tools is a collection of tools for preprocessing face images. Images will go through FaceDetection, FaceAlignment, FaceCrโฆโ16May 3, 2016Updated 9 years ago
- The official pytorch implemention of our IJCV-2025 paper "Learning with Enriched Inductive Biases for Vision-Language Models".โ15Mar 26, 2025Updated last year
- AI Agents on DigitalOcean Gradient AI Platform โข AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code for the "Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning" paper.โ17Nov 21, 2025Updated 4 months ago
- 2019 PyCon kr tutorial: "๋ค์ด๋ฒ ์ํ ํ์ ๋ฐ์ดํฐ๋ก ์์ฐ์ด์ฒ๋ฆฌ ๋ ผ๋ฌธ ๊ตฌํ ์์ํ๊ธฐ"โ13Aug 21, 2019Updated 6 years ago
- code for EMNLP 2024 paper: Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysisโ12Nov 17, 2024Updated last year
- code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"โ19Mar 10, 2025Updated last year
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Dataโ113Apr 19, 2025Updated 11 months ago
- [ACL 2024 Findings] The official repo for "ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large โฆโ25May 29, 2024Updated last year
- โ29May 7, 2024Updated last year
- โ14Sep 17, 2025Updated 6 months ago
- Official implementation of the ACL 2022 paper "Learning Non-Autoregressive Models from Search for Unsupervised Sentence Summarization"โ14Dec 26, 2022Updated 3 years ago
- GPU virtual machines on DigitalOcean Gradient AI โข AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [ICLR 2025 Oral] Knowledge Entropy Decay during Language Model Pretraining Hinders New Knowledge Acquisitionโ17Nov 25, 2024Updated last year
- The TalkMoves Dataset: K-12 mathematics lesson transcripts annotated for teacher and student discursive movesโ36Feb 4, 2022Updated 4 years ago
- Code for the paper "Attention Temperature Matters in Abstractive Summarization Distillation"(https://arxiv.org/abs/2106.03441)โ13Mar 25, 2022Updated 4 years ago
- This module is a tool for calculating correlations such as Partial, Tetrachoric, Intraclass correlation coefficients, Bootstrap agreementโฆโ11Apr 1, 2026Updated 2 weeks ago
- Scripts for creating a Japanese-English parallel corpus and training NMT modelsโ18Nov 9, 2021Updated 4 years ago
- โ12Nov 21, 2024Updated last year
- Summarize a document conditioned on aspect keywords.โ17Sep 7, 2022Updated 3 years ago
- [ACL 2025] Are Your LLMs Capable of Stable Reasoning?โ32Aug 5, 2025Updated 8 months ago
- A pipeline for the automatic construction of geometry problems along with step-by-step solutions.โ17Aug 27, 2025Updated 7 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI โข AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- Argotario: a multi-lingual serious game to tackle fallacious argumentationโ16Oct 14, 2025Updated 6 months ago
- The Chrome Experience User Survey (CUES) extension.โ12Sep 23, 2015Updated 10 years ago
- gloss browser for https://github.com/foss-np/np-l10n-glossaryโ10Oct 29, 2017Updated 8 years ago
- ACL 2022: Just Rank: Rethinking Evaluation with Word and Sentence Similaritiesโ35Dec 14, 2022Updated 3 years ago
- This is the official repository for the paper "MathCanvas: Intrinsic Visual Chain-of-Thought for Multimodal Mathematical Reasoning"โ67Dec 29, 2025Updated 3 months ago
- [EMNLP2022] Released code for paper "Distilling Causal Effect from Miscellaneous Other-Class for Continual Named Entity Recognition"โ22Feb 9, 2023Updated 3 years ago
- SCUT-EnsExam is a real-world handwritten text erasure dataset for examination paper scenarios, which consists of 545 examination paper imโฆโ19Dec 5, 2023Updated 2 years ago