khuangaf / CHOCOLATELinks

Code and data for the ACL 2024 Findings paper "Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning"

☆27

Alternatives and similar repositories for CHOCOLATE

Users that are interested in CHOCOLATE are comparing it to the libraries listed below

Sorting:

princeton-nlp / PTP
Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
☆31Updated last year
OSU-NLP-Group / Middleware
Middleware for LLMs: Tools Are Instrumental for Language Agents in Complex Environments (EMNLP'2024)
☆37Updated 10 months ago
abhika-m / FAVA
☆74Updated last year
princeton-nlp / CharXiv
[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
☆128Updated 6 months ago
WHGTyen / BIG-Bench-Mistake
A dataset of LLM-generated chain-of-thought steps annotated with mistake location.
☆82Updated last year
wwxu21 / CUT
Source code of "Reasons to Reject? Aligning Language Models with Judgments"
☆58Updated last year
open-compass / CriticEval
[NeurIPS 2024] A comprehensive benchmark for evaluating critique ability of LLMs
☆47Updated 11 months ago
lifan-yuan / CRAFT
Code for ICLR 2024 paper "CRAFT: Customizing LLMs by Creating and Retrieving from Specialized Toolsets"
☆60Updated last year
clinicalml / co-llm
Co-LLM: Learning to Decode Collaboratively with Multiple Language Models
☆122Updated last year
DAMO-NLP-SG / contrastive-cot
Contrastive Chain-of-Thought Prompting
☆68Updated last year
psunlpgroup / VisOnlyQA
This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…
☆27Updated 3 months ago
oriyor / reasoning-on-cots
Implementation of the paper: "Answering Questions by Meta-Reasoning over Multiple Chains of Thought"
☆96Updated last year
fangyuan-ksgk / CoT-Reasoning-without-Prompting
Unofficial Implementation of Chain-of-Thought Reasoning Without Prompting
☆33Updated last year
ernie-research / Tool-Augmented-Reward-Model
[ICLR'24 spotlight] Tool-Augmented Reward Modeling
☆51Updated 4 months ago
GAIR-NLP / scaleeval
Scalable Meta-Evaluation of LLMs as Evaluators
☆42Updated last year
GAIR-NLP / benbench
Benchmarking Benchmark Leakage in Large Language Models
☆55Updated last year
john-hewitt / implicit-ins
Codebase for Instruction Following without Instruction Tuning
☆36Updated last year
Pranjal2041 / AdaptiveConsistency
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs
☆39Updated last year
GasolSun36 / Iter-CoT
[NAACL 2024] Enhancing Chain-of-Thoughts Prompting with Iterative Bootstrapping in Large Language Models
☆86Updated last year
princeton-nlp / LM-Science-Tutor
☆46Updated last year
maszhongming / ParaKnowTransfer
Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"
☆32Updated last year
csitfun / LogiCoT
the instructions and demonstrations for building a formal logical reasoning capable GLM
☆54Updated last year
GAIR-NLP / MetaCritique
Evaluate the Quality of Critique
☆36Updated last year
dqxiu / KAssess
☆14Updated 2 years ago
OSU-NLP-Group / LLM-Knowledge-Conflict
[ICLR'24 Spotlight] "Adaptive Chameleon or Stubborn Sloth: Revealing the Behavior of Large Language Models in Knowledge Conflicts"
☆77Updated last year
tianyi-lab / MoE-Embedding
[ICLR 2025 Oral] "Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free"
☆82Updated last year
cambridgeltl / PairS
Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators (Liu et al.; COLM 2024)
☆49Updated 9 months ago
hkust-nlp / B-STaR
B-STAR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
☆85Updated 5 months ago
VisualWebBench / VisualWebBench
Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"
☆59Updated last year
banksy23 / XCoder
☆36Updated 3 months ago