ZhaofengWu / counterfactual-evaluation
☆44Updated 8 months ago
Related projects: ⓘ
- ☆61Updated 3 months ago
- ☆32Updated 5 months ago
- Analyzing LLM Alignment via Token distribution shift☆13Updated 7 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆51Updated 5 months ago
- Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision☆78Updated last week
- [ICLR 2023] Code for our paper "Selective Annotation Makes Language Models Better Few-Shot Learners"☆105Updated last year
- Github repository for "FELM: Benchmarking Factuality Evaluation of Large Language Models"☆54Updated 8 months ago
- ☆39Updated 9 months ago
- Evaluate the Quality of Critique☆35Updated 3 months ago
- ☆24Updated 4 months ago
- Official repository for ICLR 2024 Spotlight paper "Large Language Models Are Not Robust Multiple Choice Selectors"☆32Updated 3 months ago
- WikiWhy is a new benchmark for evaluating LLMs' ability to explain between cause-effect relationships. It is a QA dataset containing 9000…☆44Updated 9 months ago
- [EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions☆96Updated last week
- ☆69Updated 10 months ago
- ☆44Updated 2 weeks ago
- Methods and evaluation for aligning language models temporally☆24Updated 6 months ago
- Supporting code for ReCEval paper☆26Updated this week
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models☆33Updated 9 months ago
- Repo for the paper "Large Language Models Struggle to Learn Long-Tail Knowledge"☆71Updated last year
- Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs☆29Updated 7 months ago
- ☆77Updated last year
- ☆23Updated last year
- Explore what LLMs are really leanring over SFT☆26Updated 5 months ago
- ☆46Updated 10 months ago
- Trial and Error: Exploration-Based Trajectory Optimization of LLM Agents (ACL 2024 Main Conference)☆86Updated 3 months ago
- ☆28Updated 7 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆37Updated last year
- ☆49Updated last year
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆20Updated 6 months ago
- ☆21Updated 4 months ago