cliangyu/Cola

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/cliangyu/Cola)

cliangyu / Cola

[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"

☆106

Alternatives and similar repositories for Cola

Users that are interested in Cola are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

pufanyi / syphus
View on GitHub
Syphus: Automatic Instruction-Response Generation Pipeline
☆14Dec 14, 2023Updated 2 years ago
Luodian / GenBench
View on GitHub
Benchmarking and Analyzing Generative Data for Visual Recognition
☆26Jul 25, 2023Updated 2 years ago
shulin16 / MMInA
View on GitHub
[ACL2025 Findings] Benchmarking Multihop Multimodal Internet Agents
☆54Feb 27, 2025Updated last year
UCSC-VLAA / Sight-Beyond-Text
View on GitHub
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
☆20Sep 15, 2023Updated 2 years ago
archiki / RepARe
View on GitHub
☆21Oct 10, 2023Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
KaiyangZhou / on-device-dg
View on GitHub
On-Device Domain Generalization
☆47Nov 9, 2022Updated 3 years ago
LilyDaytoy / OpenPVSG
View on GitHub
Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23
☆104Apr 30, 2024Updated 2 years ago
EternityYW / Gemini-Commonsense-Evaluation
View on GitHub
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆38Jan 3, 2024Updated 2 years ago
FrankFundel / SGCond
View on GitHub
☆10Jun 28, 2023Updated 3 years ago
yashkant / concat-vqa
View on GitHub
Official code for the paper "Contrast and Classify: Training Robust VQA Models" published at ICCV, 2021
☆19Jul 27, 2021Updated 4 years ago
ZhangYuanhan-AI / visual_prompt_retrieval
View on GitHub
[NeurIPS2023] Official implementation and model release of the paper "What Makes Good Examples for Visual In-Context Learning?"
☆182Mar 4, 2024Updated 2 years ago
zzxslp / SoM-LLaVA
View on GitHub
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
☆145Aug 23, 2024Updated last year
Nicous20 / FunQA
View on GitHub
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …
☆104Dec 25, 2025Updated 6 months ago
WildVision-AI / WildVision-Bench
View on GitHub
☆17Oct 21, 2024Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ugorsahin / Generative-Negative-Mining
View on GitHub
[WACV 2024] Enhancing Multimodal Compositional Reasoning of Visual Language Models with Generative Negative Mining, WACV 2024
☆13Jan 3, 2024Updated 2 years ago
synvo-ai / local-cocoa
View on GitHub
A local AI assistant running on your device. It turns your files into actionable memory.
☆55Mar 24, 2026Updated 3 months ago
cyzus / thoughtsculpt
View on GitHub
THOUGHTSCULPT, a general reasoning and search method for complex tasks
☆13Dec 13, 2024Updated last year
xvjiarui / IMProv
View on GitHub
IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks
☆57Sep 26, 2024Updated last year
peterljq / Tutorial-of-Data-Distillation-and-Condensation
View on GitHub
A comprehensive overview of Data Distillation and Condensation (DDC). DDC is a data-centric task where a representative (i.e., small but …
☆13Dec 1, 2022Updated 3 years ago
AtsuMiyai / UPD
View on GitHub
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆82Mar 6, 2026Updated 4 months ago
HaozheZhao / MIC
View on GitHub
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆361Dec 18, 2023Updated 2 years ago
EvolvingLMMs-Lab / engram
View on GitHub
Privacy-first AI memory layer - Signal for AI Memory. E2EE, local-first, works with Claude, Cursor, and any MCP-compatible AI.
☆23Jun 12, 2026Updated last month
showlab / T2VScore
View on GitHub
T2VScore: Towards A Better Metric for Text-to-Video Generation
☆81Apr 10, 2024Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
king159 / Pair-Net
View on GitHub
[IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation
☆101Nov 20, 2024Updated last year
Hritikbansal / videocon
View on GitHub
☆58Apr 24, 2024Updated 2 years ago
ImKeTT / ReSee
View on GitHub
[EMNLP'23 Oral] ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue PyTorch Implementation
☆12Dec 4, 2023Updated 2 years ago
EvolvingLMMs-Lab / LongVA
View on GitHub
Long Context Transfer from Language to Vision
☆407Mar 18, 2025Updated last year
JiwanChung / vlis
View on GitHub
☆24Oct 9, 2023Updated 2 years ago
open-compass / MMBench
View on GitHub
Official Repo of "MMBench: Is Your Multi-modal Model an All-around Player?"
☆306May 22, 2025Updated last year
HenryHZY / VL-PET
View on GitHub
[ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"
☆53Sep 21, 2023Updated 2 years ago
pkunlp-icler / PCA-EVAL
View on GitHub
[ACL 2024] PCA-Bench: Evaluating Multimodal Large Language Models in Perception-Cognition-Action Chain
☆107Mar 14, 2024Updated 2 years ago
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
tomchen-ctj / CVPR23-LOVEU-AQTC
View on GitHub
【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge
☆15Jul 18, 2023Updated 3 years ago
kennymckormick / ARAS-Dataset
View on GitHub
☆11Nov 5, 2024Updated last year
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
ZhangYuanhan-AI / OmniBenchmark
View on GitHub
[ECCV2022] New benchmark for evaluating pre-trained model; New supervised contrastive learning framework.
☆110Dec 8, 2023Updated 2 years ago
deepglint / ALIP
View on GitHub
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆106Sep 18, 2023Updated 2 years ago
Jiahao000 / ORL
View on GitHub
[NeurIPS 2021] ORL: Unsupervised Object-Level Representation Learning from Scene Images
☆58Dec 6, 2021Updated 4 years ago
YuchenLiu98 / COMM
View on GitHub
Pytorch code for paper From CLIP to DINO: Visual Encoders Shout in Multi-modal Large Language Models
☆211Jan 8, 2025Updated last year