htqin / GoogleBard-VisUnderstandLinks

How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges

☆30

Alternatives and similar repositories for GoogleBard-VisUnderstand

Users that are interested in GoogleBard-VisUnderstand are comparing it to the libraries listed below

Sorting:

umd-huang-lab / perceptionCLIP
Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"
☆79Updated last year
FactoDeepLearning / MultitaskVLFM
☆26Updated 2 years ago
AtsuMiyai / UPD
[ACL2025] Unsolvable Problem Detection: Robust Understanding Evaluation for Large Multimodal Models
☆78Updated 5 months ago
lambert-x / ProLab
Official Pytorch Implementation of Paper "A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Des…
☆56Updated 2 months ago
sunsmarterjie / ChatterBox
[AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues
☆58Updated 6 months ago
eric-ai-lab / ComCLIP
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Updated last year
TencentARC / ViSFT
☆35Updated last year
NVlabs / STL
Official Pytorch Implementation of Self-emerging Token Labeling
☆35Updated last year
kaiyuyue / nxtp
PyTorch Implementation of Object Recognition as Next Token Prediction [CVPR'24 Highlight]
☆181Updated 6 months ago
iancovert / locality-alignment
☆53Updated 10 months ago
RenShuhuai-Andy / TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆49Updated last year
AIFEG / BenchLMM
[ECCV 2024] BenchLMM: Benchmarking Cross-style Visual Capability of Large Multimodal Models
☆86Updated last year
OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
EternityYW / Gemini-Commonsense-Evaluation
Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"
☆37Updated last year
m1k2zoo / negbench
Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"
☆38Updated 6 months ago
kirill-vish / Beyond-INet
Code for experiments for "ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy"
☆101Updated last year
bfshi / TOAST
Official code for "TOAST: Transfer Learning via Attention Steering"
☆186Updated 2 years ago
jeykigung / HiCLIP
☆30Updated 2 years ago
alhojel / visual_task_vectors
☆39Updated last year
uvavision / SyViC
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Updated 2 years ago
MCR-PEFT / Ex-MCR
☆45Updated 6 months ago
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆103Updated 2 years ago
ZechengLi19 / CIM
[IJCAI'23] Complete Instances Mining for Weakly Supervised Instance Segmentation
☆38Updated last year
gregor-ge / mBLIP
☆87Updated last year
Christina200 / Online-LoRA-official
[WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…
☆51Updated 2 months ago
DCDmllm / MorphTokens
☆43Updated last year
isekai-portal / Link-Context-Learning
☆100Updated last year
csarron / PuMer
[ACL 2023] PuMer: Pruning and Merging Tokens for Efficient Vision Language Models
☆34Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆94Updated last year
SHI-Labs / VisPer-LM
[NeurIPS 2025] Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation, arXiv 2024
☆64Updated last month