khuangaf / CHOCOLATE
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
☆24Updated 9 months ago
Alternatives and similar repositories for CHOCOLATE:
Users that are interested in CHOCOLATE are comparing it to the libraries listed below
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆27Updated 8 months ago
- AbstainQA, ACL 2024☆25Updated 5 months ago
- Evaluate the Quality of Critique☆35Updated 9 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆39Updated last year
- ☆20Updated 7 months ago
- RAG-RewardBench: Benchmarking Reward Models in Retrieval Augmented Generation for Preference Alignment☆15Updated 2 months ago
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆66Updated 11 months ago
- ☆34Updated 11 months ago
- ☆29Updated last year
- Supporting code for ReCEval paper☆28Updated 5 months ago
- ☆67Updated last year
- Repository for "Propagating Knowledge Updates to LMs Through Distillation" (NeurIPS 2023).☆25Updated 6 months ago
- ☆48Updated 2 months ago
- [EMNLP'24 (Main)] DRPO(Dynamic Rewarding with Prompt Optimization) is a tuning-free approach for self-alignment. DRPO leverages a search-…☆20Updated 3 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆58Updated last year
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆22Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆44Updated 2 months ago
- This is the repository for paper "CREATOR: Tool Creation for Disentangling Abstract and Concrete Reasoning of Large Language Models"☆23Updated last year
- ☆14Updated last year
- MUFFIN: Curating Multi-Faceted Instructions for Improving Instruction-Following☆15Updated 4 months ago
- [NAACL 2024] Making Language Models Better Tool Learners with Execution Feedback☆41Updated 11 months ago
- ☆41Updated last year
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- ☆15Updated 7 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆24Updated last year
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆85Updated last month
- Codebase for Instruction Following without Instruction Tuning☆33Updated 5 months ago
- IntructIR, a novel benchmark specifically designed to evaluate the instruction following ability in information retrieval models. Our foc…☆31Updated 9 months ago