wjpoom/SPEC

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wjpoom/SPEC)

wjpoom / SPEC

[CVPR 2024] The official implementation of paper "synthesize, diagnose, and optimize: towards fine-grained vision-language understanding"

☆52

Alternatives and similar repositories for SPEC

Users that are interested in SPEC are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

inst-it / inst-it
View on GitHub
[NeurIPS 2025] The official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tun…
☆40Feb 20, 2025Updated last year
ytaek-oh / vl_compo
View on GitHub
☆10Jul 5, 2024Updated 2 years ago
QizaoWang / CAMC-CCReID
View on GitHub
Co-Attention Aligned Mutual Cross-Attention for Cloth-Changing Person Re-Identification [ACCV 2022 Oral]
☆17Dec 26, 2024Updated last year
QizaoWang / FIRe-CCReID
View on GitHub
Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [TIFS 2024]
☆24Jul 1, 2024Updated 2 years ago
MengLcool / SliMM
View on GitHub
☆25Dec 26, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
ChenHsing / VIDiff
View on GitHub
☆39Dec 4, 2023Updated 2 years ago
Annusha / xmic
View on GitHub
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization, CVPR 2024
☆11Nov 7, 2024Updated last year
wdrink / OpenTokenizer
View on GitHub
☆21Jan 17, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
wdrink / ARM
View on GitHub
ARM: An AutoRegressive Large Multimodal Model with Discrete Representations
☆50Jun 10, 2026Updated last month
Row11n / Prova
View on GitHub
[AAAI-25] Official repository of "Comprehensive Multi-Modal Prototypes are Simple and Effective Classifiers for Vast-Vocabulary Object De…
☆20Dec 27, 2024Updated last year
Lackel / AGLA
View on GitHub
[CVPR 2025] Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
☆68Jul 16, 2024Updated 2 years ago
ivonajdenkoska / tulip
View on GitHub
[ICLR 2025] Official code repository for "TULIP: Token-length Upgraded CLIP"
☆32Jan 26, 2026Updated 5 months ago
lwmming / AGS
View on GitHub
Code for the AAAI 2024 paper: "AGS: Affordable and Generalizable Substitute Training for Transferable Adversarial Attack" (accepted).
☆12Mar 28, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
YuxiXie / V-DPO
View on GitHub
Preference Learning for LLaVA
☆60Nov 9, 2024Updated last year
wuw2019 / LoTLIP
View on GitHub
[NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understanding
☆49Jan 14, 2025Updated last year
zjunlp / Deco
View on GitHub
[ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation
☆146Sep 11, 2025Updated 10 months ago
kaist-ami / BEAF
View on GitHub
[ECCV’24] Official repository for "BEAF: Observing Before-AFter Changes to Evaluate Hallucination in Vision-language Models"
☆22Mar 26, 2025Updated last year
FreedomIntelligence / TRIM
View on GitHub
We introduce new approach, Token Reduction using CLIP Metric (TRIM), aimed at improving the efficiency of MLLMs without sacrificing their…
☆22Jan 11, 2026Updated 6 months ago
ShareLab-SII / UniAR
View on GitHub
[ICML 2026] The official implementation of paper "Unified Multimodal Autoregressive Modeling with Shared Context—Visual Tokenizer is Key …
☆46Jul 13, 2026Updated last week
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated 2 years ago
swordlidev / Evaluation-Multimodal-LLMs-Survey
View on GitHub
A Survey on Benchmarks of Multimodal Large Language Models
☆156Jul 13, 2026Updated last week
LALBJ / PAI
View on GitHub
[ECCV 2024] Paying More Attention to Image: A Training-Free Method for Alleviating Hallucination in LVLMs
☆172Nov 6, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
chancharikmitra / CCoT
View on GitHub
[CVPR 2024] Official Code for the Paper "Compositional Chain-of-Thought Prompting for Large Multimodal Models"
☆142Jun 20, 2024Updated 2 years ago
OpenGVLab / MMIU
View on GitHub
[ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models
☆98Sep 14, 2024Updated last year
yuezih / less-is-more
View on GitHub
Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)
☆58Oct 28, 2024Updated last year
hammoudhasan / SynthCLIP
View on GitHub
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆104Mar 23, 2025Updated last year
KevinLight831 / AMC
View on GitHub
[ToMM2023] - AMC: Adaptive Multi-expert Collaborative Network for Text-guided Image Retrieval
☆20Aug 30, 2024Updated last year
UCSC-VLAA / CLIPS
View on GitHub
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆40Apr 18, 2025Updated last year
Becomebright / GroundVQA
View on GitHub
Official PyTorch code of GroundVQA (CVPR'24)
☆63Sep 13, 2024Updated last year
claws-lab / projection-in-MLLMs
View on GitHub
Code and data for ACL 2024 paper on 'Cross-Modal Projection in Multimodal LLMs Doesn't Really Project Visual Attributes to Textual Space'
☆18Jul 21, 2024Updated last year
yuhui-zh15 / VLMClassifier
View on GitHub
Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?" (NeurIPS 2024)
☆98Oct 19, 2024Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
YiyangZhou / CSR
View on GitHub
[NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models
☆87Oct 26, 2025Updated 8 months ago
mertyg / vision-language-models-are-bows
View on GitHub
Experiments and data for the paper "When and why vision-language models behave like bags-of-words, and what to do about it?" Oral @ ICLR …
☆294Jun 7, 2023Updated 3 years ago
Qinyu-Allen-Zhao / LVLM-LP
View on GitHub
The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?
☆43Nov 1, 2024Updated last year
UCSB-AI / ComCLIP
View on GitHub
Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"
☆37Aug 18, 2024Updated last year
sled-group / COMFORT
View on GitHub
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…
☆22Oct 24, 2024Updated last year
OmkarThawakar / composed-video-retrieval
View on GitHub
Composed Video Retrieval
☆62May 2, 2024Updated 2 years ago
iancovert / locality-alignment
View on GitHub
☆55Jan 17, 2025Updated last year