rosewang2008 / bridge
NAACL 2024. Code & Dataset for "๐ Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes"
โ29Updated 3 months ago
Related projects โ
Alternatives and complementary repositories for bridge
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Dataโ77Updated 3 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answersโ122Updated 7 months ago
- โ61Updated 7 months ago
- ๐งฎ MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023โ44Updated 8 months ago
- โ94Updated 6 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"โ62Updated last year
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"โ78Updated 2 months ago
- Repository for the ACL 2024 conference websiteโ17Updated last month
- Governance of the Commons Simulation (GovSim)โ20Updated 3 months ago
- Benchmarking library for RAGโ112Updated this week
- Functional Benchmarks and the Reasoning Gapโ78Updated last month
- Inspecting and Editing Knowledge Representations in Language Modelsโ107Updated last year
- โ29Updated last year
- โ21Updated 8 months ago
- โ196Updated 2 weeks ago
- [ICLR 2023] Guess the Instruction! Flipped Learning Makes Language Models Stronger Zero-Shot Learnersโ111Updated last month
- Evaluating LLMs with fewer examplesโ133Updated 6 months ago
- The Prism Alignment Projectโ37Updated 6 months ago
- PRODIGy is a collection of dialogues in which each conversation is aligned with speaker profile representations.โ16Updated 2 months ago
- The official code repo for "Sub-Sentence Encoder: Contrastive Learning of Propositional Semantic Representations".โ75Updated 9 months ago
- A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paperโฆโ105Updated last month
- โ26Updated 3 weeks ago
- โ86Updated 5 months ago
- ๐ป Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"โ51Updated 5 months ago
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Setsโ210Updated 10 months ago
- This repository hosts the paper โLLM Based Math Tutoring: Challenges and Datasetโ, along with the accompanying dataset. It explores the pโฆโ28Updated 2 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"โ61Updated 4 months ago
- โ18Updated last year
- Multilingual Large Language Models Evaluation Benchmarkโ105Updated 2 months ago
- โ31Updated last year