uvavision / SelfEQLinks
[CVPR 2024] Code for "Improved Visual Grounding through Self-Consistent Explanations".
☆26Updated last year
Alternatives and similar repositories for SelfEQ
Users that are interested in SelfEQ are comparing it to the libraries listed below
Sorting:
- (NeurIPS 2024 Spotlight) TOPA: Extend Large Language Models for Video Understanding via Text-Only Pre-Alignment☆31Updated 10 months ago
- VisualGPTScore for visio-linguistic reasoning☆27Updated last year
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆85Updated last year
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated last year
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆39Updated 4 months ago
- Official PyTorch code of GroundVQA (CVPR'24)☆61Updated 10 months ago
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆53Updated 8 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆60Updated last year
- [CVPR 2023 Highlight & TPAMI] Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning☆121Updated 7 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆63Updated last year
- Composed Video Retrieval☆58Updated last year
- This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World"…☆47Updated last year
- Turning to Video for Transcript Sorting☆48Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆21Updated 4 months ago
- ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization☆73Updated last year
- [CVPR 2023 & IJCV 2025] Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation☆62Updated last week
- [CVPR2023] The code for 《Position-guided Text Prompt for Vision-Language Pre-training》☆152Updated 2 years ago
- ☆62Updated 2 years ago
- [CVPR 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆50Updated 4 months ago
- ☆16Updated last year
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆40Updated 5 months ago
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.☆28Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆56Updated last year
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆39Updated last year
- (ICML 2024) Improve Context Understanding in Multimodal Large Language Models via Multimodal Composition Learning☆27Updated 10 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated 2 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆21Updated 6 months ago
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆78Updated last year
- A reading list of papers about Visual Grounding.☆32Updated 2 years ago