[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆105Nov 9, 2023Updated 2 years ago
Alternatives and similar repositories for Cola
Users that are interested in Cola are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆52Feb 27, 2025Updated last year
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- On-Device Domain Generalization☆46Nov 9, 2022Updated 3 years ago
- ☆21Oct 10, 2023Updated 2 years ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆103Apr 30, 2024Updated last year
- CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning☆28Feb 11, 2026Updated last month
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆182Mar 4, 2024Updated 2 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆37Jan 3, 2024Updated 2 years ago
- ☆10Jun 28, 2023Updated 2 years ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Sep 26, 2024Updated last year
- Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021☆19Jul 27, 2021Updated 4 years ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆146Aug 23, 2024Updated last year
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆104Dec 25, 2025Updated 2 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆81Apr 10, 2024Updated last year
- ☆16Oct 21, 2024Updated last year
- ☆58Apr 24, 2024Updated last year
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest☆551Jun 3, 2025Updated 9 months ago
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆81Mar 6, 2026Updated 2 weeks ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆360Dec 18, 2023Updated 2 years ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆54Mar 14, 2026Updated last week
- Long Context Transfer from Language to Vision☆402Mar 18, 2025Updated last year
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆292May 22, 2025Updated 10 months ago
- [EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue PyTorch Implementation☆13Dec 4, 2023Updated 2 years ago
- Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024☆13Jan 3, 2024Updated 2 years ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Apr 27, 2024Updated last year
- ☆24Oct 9, 2023Updated 2 years ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆605Oct 6, 2024Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Mar 14, 2024Updated 2 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- [IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation☆99Nov 20, 2024Updated last year
- ☆11Nov 5, 2024Updated last year
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images☆58Dec 6, 2021Updated 4 years ago
- [ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.☆110Dec 8, 2023Updated 2 years ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆209Jan 8, 2025Updated last year