rosewang2008 / bridge
NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistakes"
☆31Updated 6 months ago
Alternatives and similar repositories for bridge:
Users that are interested in bridge are comparing it to the libraries listed below
- Edu-ConvoKit: An Open-Source Framework for Education Conversation Data☆82Updated 5 months ago
- Code and Data for "Evaluating Correctness and Faithfulness of Instruction-Following Models for Question Answering"☆82Updated 5 months ago
- ☆65Updated 9 months ago
- 🧮 MathDial: A Dialog Tutoring Dataset with Rich Pedagogical Properties Grounded in Math Reasoning Problems, EMNLP Findings 2023☆45Updated 10 months ago
- [Data + code] ExpertQA : Expert-Curated Questions and Attributed Answers☆124Updated 10 months ago
- ☆90Updated 7 months ago
- ☆32Updated last year
- ☆100Updated 8 months ago
- ☆56Updated 3 months ago
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆66Updated last year
- The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice De…☆13Updated 9 months ago
- ☆206Updated last week
- ☆20Updated 7 months ago
- Code for the ICLR 2024 paper "How to catch an AI liar: Lie detection in black-box LLMs by asking unrelated questions"☆64Updated 7 months ago
- Backtracing: Retrieving the Cause of the Query, EACL 2024 Long Paper, Findings.☆88Updated 6 months ago
- Codebase accompanying the Summary of a Haystack paper.☆74Updated 4 months ago
- Code accompanying "How I learned to start worrying about prompt formatting".☆99Updated 3 months ago
- Minimum Bayes Risk Decoding for Hugging Face Transformers☆56Updated 7 months ago
- Multilingual Large Language Models Evaluation Benchmark☆115Updated 5 months ago
- ☆116Updated 3 months ago
- Baby's CoThought: Leveraging LLMs for Enhanced Reasoning in Compact Models☆17Updated last week
- Discovering Data-driven Hypotheses in the Wild☆51Updated 2 months ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆52Updated 7 months ago
- The GitHub repo for Goal Driven Discovery of Distributional Differences via Language Descriptions☆68Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆41Updated last month
- [ICLR 2024 Spotlight] FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets☆213Updated last year
- Inspecting and Editing Knowledge Representations in Language Models☆111Updated last year
- ☆31Updated last year
- Functional Benchmarks and the Reasoning Gap☆82Updated 3 months ago
- Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment☆53Updated 4 months ago