[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆106Nov 9, 2023Updated 2 years ago
Alternatives and similar repositories for Cola
Users that are interested in Cola are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Syphus: Automatic Instruction-Response Generation Pipeline☆14Dec 14, 2023Updated 2 years ago
- Benchmarking and Analyzing Generative Data for Visual Recognition☆26Jul 25, 2023Updated 2 years ago
- [ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents☆54Feb 27, 2025Updated last year
- [TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"☆20Sep 15, 2023Updated 2 years ago
- On-Device Domain Generalization☆47Nov 9, 2022Updated 3 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆21Oct 10, 2023Updated 2 years ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆102Apr 30, 2024Updated 2 years ago
- [NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"☆183Mar 4, 2024Updated 2 years ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆38Jan 3, 2024Updated 2 years ago
- ☆10Jun 28, 2023Updated 2 years ago
- IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks☆57Sep 26, 2024Updated last year
- CamReasoner: Reinforcing Camera Movement Understanding via Structured Spatial Reasoning☆30May 23, 2026Updated 2 weeks ago
- Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021☆19Jul 27, 2021Updated 4 years ago
- [COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs☆145Aug 23, 2024Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆104Dec 25, 2025Updated 5 months ago
- T2VScore: Towards A Better Metric for Text-to-Video Generation☆81Apr 10, 2024Updated 2 years ago
- ☆16Oct 21, 2024Updated last year
- ☆58Apr 24, 2024Updated 2 years ago
- THOUGHTSCULPT, a general reasoning and search method for complex tasks☆13Dec 13, 2024Updated last year
- A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …☆13Dec 1, 2022Updated 3 years ago
- (ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest☆555Jun 3, 2025Updated last year
- [ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models☆82Mar 6, 2026Updated 3 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆360Dec 18, 2023Updated 2 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- [ACL 2026 Oral] Official implementation of LaMI: Augmenting Large Language Models via Late Multi-Image Fusion☆19May 18, 2026Updated 3 weeks ago
- A local AI assistant running on your device. It turns your files into actionable memory.☆55Mar 24, 2026Updated 2 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Feb 5, 2024Updated 2 years ago
- Long Context Transfer from Language to Vision☆405Mar 18, 2025Updated last year
- Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"☆302May 22, 2025Updated last year
- [EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue PyTorch Implementation☆12Dec 4, 2023Updated 2 years ago
- [WACV 2024] Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024☆13Jan 3, 2024Updated 2 years ago
- Official repo for the TMLR paper "Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners"☆29Apr 27, 2024Updated 2 years ago
- ☆24Oct 9, 2023Updated 2 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Content☆601Oct 6, 2024Updated last year
- [ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain☆107Mar 14, 2024Updated 2 years ago
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Sep 21, 2023Updated 2 years ago
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- [IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation☆101Nov 20, 2024Updated last year
- [NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images☆58Dec 6, 2021Updated 4 years ago
- ☆11Nov 5, 2024Updated last year