allenai / understanding_mcqaLinks
Code for the arXiv preprint "Answer, Assemble, Ace: Understanding How Transformers Answer Multiple Choice Questions"
☆12Updated 3 weeks ago
Alternatives and similar repositories for understanding_mcqa
Users that are interested in understanding_mcqa are comparing it to the libraries listed below
Sorting:
- Evaluating the Moral Beliefs Encoded in LLMs☆27Updated 8 months ago
- Scalable Meta-Evaluation of LLMs as Evaluators☆42Updated last year
- Mangrove is the backend module of Estuary, a framework for building multimodal real-time Socially Intelligent Agents (SIAs).☆13Updated last month
- Implementation of the paper: "AssistantBench: Can Web Agents Solve Realistic and Time-Consuming Tasks?"☆60Updated 8 months ago
- ⚓️ Repository for the "Thought Anchors: Which LLM Reasoning Steps Matter?" paper.☆65Updated last week
- Reasoning by Communicating with Agents☆29Updated 3 months ago
- 🌟 SwarmAgent: A framework for simulating social group dynamics using multi-agent collaboration, aiding insights into collective behavior…☆12Updated last year
- Measuring the situational awareness of language models☆38Updated last year
- datasets from the paper "Towards Understanding Sycophancy in Language Models"☆88Updated last year
- Code and Dataset for Learning to Solve Complex Tasks by Talking to Agents☆24Updated 3 years ago
- 👻 Code and benchmark for our EMNLP 2023 paper - "FANToM: A Benchmark for Stress-testing Machine Theory of Mind in Interactions"☆55Updated last year
- Official repo for Learning to Reason for Long-Form Story Generation☆68Updated 4 months ago
- A toolkit for describing model features and intervening on those features to steer behavior.☆197Updated 9 months ago
- Functional Benchmarks and the Reasoning Gap☆88Updated 10 months ago
- DialOp: Decision-oriented dialogue environments for collaborative language agents☆109Updated 9 months ago
- [EMNLP '23] Discriminator-Guided Chain-of-Thought Reasoning☆48Updated 10 months ago
- ☆29Updated last year
- ☆52Updated 10 months ago
- A set of utilities for running few-shot prompting experiments on large-language models☆122Updated last year
- [NeurIPS 2024] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages☆49Updated 2 weeks ago
- Official implementation of LoT paper: "Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic"☆26Updated last year
- Mixture of Expert (MoE) techniques for enhancing LLM performance through expert-driven prompt mapping and adapter combinations.☆12Updated last year
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆43Updated 6 months ago
- NAACL 2024. Code & Dataset for "🌁 Bridging the Novice-Expert Gap via Models of Decision-Making: A Case Study on Remediating Math Mistake…☆43Updated last year
- The repository contains the code and dataset for the Socratic Debugging task which is a novel task for Socratically Questioning Novice De…☆18Updated last year
- ☆90Updated last year
- Framework and toolkits for building and evaluating collaborative agents that can work together with humans.☆93Updated 4 months ago
- ☆45Updated 4 months ago
- CausalGym: Benchmarking causal interpretability methods on linguistic tasks☆46Updated 8 months ago
- ☆15Updated 4 months ago