zeyofu / ReFocus_Code
Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding
☆17Updated 2 months ago
Alternatives and similar repositories for ReFocus_Code:
Users that are interested in ReFocus_Code are comparing it to the libraries listed below
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆14Updated 2 weeks ago
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆49Updated 4 months ago
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- ☆10Updated 4 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆41Updated 4 months ago
- ☆15Updated 7 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated 2 weeks ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆31Updated 4 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆18Updated 8 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆22Updated 3 months ago
- Preference Learning for LLaVA☆39Updated 4 months ago
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- ☆24Updated last year
- Official repo of the ICLR 2025 paper "MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos"☆25Updated 5 months ago
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆13Updated 8 months ago
- ☆19Updated last year
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆44Updated last year
- PyTorch implementation of StableMask (ICML'24)☆12Updated 8 months ago
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆44Updated 2 months ago
- [EMNLP 2024] Official implementation of "Hierarchical Deconstruction of LLM Reasoning: A Graph-Based Framework for Analyzing Knowledge Ut…☆21Updated 3 months ago
- ☆22Updated 5 months ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆19Updated last year
- ✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio☆41Updated 4 months ago
- Open-Pandora: On-the-fly Control Video Generation☆32Updated 3 months ago
- ☆39Updated 4 months ago
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 6 months ago
- Official implementation of "Automated Generation of Challenging Multiple-Choice Questions for Vision Language Model Evaluation" (CVPR 202…☆23Updated this week