khuangaf / CHOCOLATE
Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"
☆23Updated 3 months ago
Related projects: ⓘ
- [ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"☆51Updated 5 months ago
- Evaluate the Quality of Critique☆35Updated 3 months ago
- [ICLR'24 spotlight] Tool-Augmented Reward Modeling☆33Updated 6 months ago
- ☆46Updated 2 weeks ago
- ☆24Updated 7 months ago
- [ACL 2024] Code for "MoPS: Modular Story Premise Synthesis for Open-Ended Automatic Story Generation"☆29Updated 2 months ago
- This the implementation of LeCo☆16Updated 2 months ago
- Supporting code for ReCEval paper☆26Updated this week
- BRIGHT: A Realistic and Challenging Benchmark for Reasoning-Intensive Retrieval☆41Updated last month
- Code implementation of synthetic continued pretraining☆13Updated this week
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆24Updated 2 months ago
- ☆39Updated 9 months ago
- InstructRAG: Instructing Retrieval-Augmented Generation with Explicit Denoising☆32Updated 2 months ago
- [ACL 2023] Solving Math Word Problems via Cooperative Reasoning induced Language Models☆33Updated 9 months ago
- Code and data for paper "Context-faithful Prompting for Large Language Models".☆37Updated last year
- ☆31Updated 3 months ago
- Source code of "Reasons to Reject? Aligning Language Models with Judgments"☆54Updated 6 months ago
- FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions☆37Updated 2 months ago
- ☆42Updated last year
- Benchmarking Benchmark Leakage in Large Language Models☆39Updated 4 months ago
- GPT as Human☆17Updated 7 months ago
- Official repository for paper "Weak-to-Strong Extrapolation Expedites Alignment"☆62Updated 3 months ago
- The official implementation of paper "Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agen…☆20Updated 6 months ago
- Repository for paper Tools Are Instrumental for Language Agents in Complex Environments☆32Updated 8 months ago
- Offical code repository for PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation, EMNLP 2023☆10Updated 9 months ago
- [ACL 2024 Findings] CriticBench: Benchmarking LLMs for Critique-Correct Reasoning☆20Updated 6 months ago
- [COLM'24] "How Easily do Irrelevant Inputs Skew the Responses of Large Language Models?"☆18Updated last week
- ☆44Updated 8 months ago
- Visual and Embodied Concepts evaluation benchmark☆21Updated 11 months ago
- Are LLMs Capable of Data-based Statistical and Causal Reasoning? Benchmarking Advanced Quantitative Reasoning with Data☆28Updated last month