[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆105Nov 9, 2023Updated 2 years ago
Alternatives and similar repositories for Cola
Users that are interested in Cola are comparing it to the libraries listed below
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆48Feb 27, 2025Updated last year
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆58Sep 26, 2024Updated last year
- ☆58Apr 24, 2024Updated last year
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆102Apr 30, 2024Updated last year
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆145Aug 23, 2024Updated last year
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆183Mar 4, 2024Updated last year
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆81Apr 10, 2024Updated last year
- On-Device Domain Generalization☆46Nov 9, 2022Updated 3 years ago
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆37Jan 3, 2024Updated 2 years ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Feb 5, 2024Updated 2 years ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆104Dec 25, 2025Updated 2 months ago
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆106Mar 14, 2024Updated last year
- [IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation☆99Nov 20, 2024Updated last year
- Text-Guided Generation of Full-Body Image with Preserved Reference Face for Customized Animation☆24Jun 24, 2024Updated last year
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆30Apr 27, 2024Updated last year
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆360Dec 18, 2023Updated 2 years ago
- Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models☆206Jan 8, 2025Updated last year
- ☆14Oct 16, 2023Updated 2 years ago
- This repo contains the official PyTorch implementation of vLMIG: Improving Visual Commonsense in Language Models via Multiple Image Gener…☆17Jul 1, 2024Updated last year
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest☆551Jun 3, 2025Updated 9 months ago
- Long Context Transfer from Language to Vision☆402Mar 18, 2025Updated 11 months ago
- [ICLR2025] IV-Mixed Sampler: Leveraging Image Diffusion Models for Enhanced Video Synthesis☆39Feb 17, 2025Updated last year
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆288May 22, 2025Updated 9 months ago
- This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.☆19Jun 27, 2024Updated last year
- ☆16Oct 21, 2024Updated last year
- ☆16Apr 23, 2024Updated last year
- ☆16Apr 7, 2024Updated last year
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆54Feb 15, 2026Updated 2 weeks ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- LLaVA-Interactive-Demo☆380Jul 25, 2024Updated last year
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆106Nov 28, 2024Updated last year
- [ECCV2024] 🐙Octopus, an embodied vision-language model trained with RLEF, emerging superior in embodied visual planning and programming.☆295May 20, 2024Updated last year
- "Comp4D: Compositional 4D Scene Generation", Dejia Xu*, Hanwen Liang*, Neel P. Bhatt, Hezhen Hu, Hanxue Liang, Konstantinos N. Platanioti…☆78Aug 25, 2024Updated last year