GraphPKU / CoI
Chain of Images for Intuitively Reasoning
☆9Updated last year
Alternatives and similar repositories for CoI:
Users that are interested in CoI are comparing it to the libraries listed below
- Source code for the paper "Prefix Language Models are Unified Modal Learners"☆43Updated 2 years ago
- [NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models☆42Updated this week
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆65Updated 11 months ago
- This repository contains the code and data for the paper "VisOnlyQA: Large Vision Language Models Still Struggle with Visual Perception o…☆22Updated last month
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated 5 months ago
- ☆27Updated last year
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆38Updated last month
- Official code for "pi-Tuning: Transferring Multimodal Foundation Models with Optimal Multi-task Interpolation", ICML 2023.☆32Updated last year
- ☆28Updated last month
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆19Updated last year
- The released data for paper "Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models".☆32Updated last year
- Counterfactual Reasoning VQA Dataset☆25Updated last year
- [DMLR 2024] Benchmarking Robustness of Multimodal Image-Text Models under Distribution Shift☆36Updated last year
- Official Code for ACL 2023 Outstanding Paper: World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Languag…☆32Updated last year
- PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"☆38Updated last year
- [ACL 2023] Code for paper “Tailoring Instructions to Student’s Learning Levels Boosts Knowledge Distillation”(https://arxiv.org/abs/2305.…☆38Updated last year
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆52Updated last month
- Evaluation framework for paper "VisualWebBench: How Far Have Multimodal LLMs Evolved in Web Page Understanding and Grounding?"☆55Updated 6 months ago
- Code for "Seeking Neural Nuggets: Knowledge Transfer in Large Language Models from a Parametric Perspective"☆32Updated 11 months ago
- Code for paper "Unraveling Cross-Modality Knowledge Conflicts in Large Vision-Language Models."☆42Updated 6 months ago
- [ACL 2023] Delving into the Openness of CLIP☆23Updated 2 years ago
- [NeurIPS 2023] Make Your Pre-trained Model Reversible: From Parameter to Memory Efficient Fine-Tuning☆31Updated last year
- Official Code of IdealGPT☆36Updated last year
- ☆25Updated 9 months ago
- Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073☆28Updated 9 months ago
- Mosaic IT: Enhancing Instruction Tuning with Data Mosaics☆18Updated 2 months ago
- ☆40Updated 4 months ago
- ☆99Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated last year
- Code for "A Sober Look at Progress in Language Model Reasoning" paper☆41Updated 2 weeks ago