uvavision / SelfEQLinks

[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".

☆28

Alternatives and similar repositories for SelfEQ

Users that are interested in SelfEQ are comparing it to the libraries listed below

Sorting:

dhg-wei / TOPA
(NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment
☆30Updated last year
Becomebright / GroundVQA
Official PyTorch code of GroundVQA (CVPR'24)
☆64Updated last year
ExplainableML / EgoCVR
[ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval
☆41Updated 7 months ago
Yanqing0327 / MLLMs-Augmented
The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》
☆31Updated last year
whwu95 / FreeVA
FreeVA: Offline MLLM as Training-Free Video Assistant
☆65Updated last year
ailab-kyunghee / CM2_DVC
[CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval
☆63Updated last year
doc-doc / NExT-GQA
Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)
☆83Updated last year
chunmeifeng / SPRC
【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval
☆91Updated last year
NMS05 / Patch-Aligned-Contrastive-Learning
☆23Updated 2 years ago
HengLan / CGSTVG
[CVPR 2024] Context-Guided Spatio-Temporal Video Grounding
☆62Updated last year
DCDmllm / Momentor
☆80Updated last year
Code-kunkun / ZS-CIR
[BMVC 2023] Zero-shot Composed Text-Image Retrieval
☆54Updated last year
OmkarThawakar / composed-video-retrieval
Composed Video Retrieval
☆61Updated last year
linzhiqiu / visual_gpt_score
VisualGPTScore for visio-linguistic reasoning
☆27Updated 2 years ago
showlab / MovieSeq
[ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences
☆40Updated 8 months ago
jpthu17 / HBI
[CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
☆122Updated 11 months ago
takomc / amp
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆20Updated last year
lezhang7 / Enhance-FineGrained
[CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding
☆53Updated 7 months ago
vinid / neg_clip
NegCLIP.
☆38Updated 2 years ago
Paranioar / UniPT
[CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"
☆67Updated last year
rxtan2 / Koala-video-llm
☆36Updated last year
Ruiyang-061X / Awesome-MLLM-Uncertainty
✨A curated list of papers on the uncertainty in multi-modal large language model (MLLM).
☆56Updated 8 months ago
levymsn / ChatIR
Official repository of "Chatting Makes Perfect: Chat-based Image Retrieval"
☆30Updated 10 months ago
showlab / datacentric.vlp
Compress conventional Vision-Language Pre-training data
☆52Updated 2 years ago
sterzhang / PVIT
Official Repository of Personalized Visual Instruct Tuning
☆33Updated 9 months ago
PolyU-ChenLab / ETBench
👾 E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding (NeurIPS 2024)
☆71Updated 10 months ago
SivanDoveh / TSVLC
Repository for the paper: Teaching Structured Vision & Language Concepts to Vision & Language Models
☆47Updated 2 years ago
haoyu-bu / CAFe
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆24Updated 8 months ago
hrtang22 / MUSE
Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"
☆25Updated 10 months ago
zhengrongz / AoTD
[CVPR 2025] Official PyTorch code of "Enhancing Video-LLM Reasoning via Agent-of-Thoughts Distillation".
☆52Updated 6 months ago