xfactlab / I0TLinks
[ACL Main 2025] I0T: Embedding Standardization Method Towards Zero Modality Gap
☆10Updated last month
Alternatives and similar repositories for I0T
Users that are interested in I0T are comparing it to the libraries listed below
Sorting:
- This is a repository contains the implementation of our NeurIPS'24 paper "Temporal Sentence Grounding with Relevance Feedback in Videos"☆10Updated 7 months ago
- ☆12Updated 6 months ago
- (ICML 2025) Rethinking Chain-of-Thought from the Perspective of Self-Training☆9Updated 5 months ago
- [CVPR 2025] Augmenting Multimodal LLMs with Self-Reflective Tokens for Knowledge-based Visual Question Answering☆38Updated this week
- Adapt MLLMs to Domains via Post-Training☆9Updated 6 months ago
- [EMNLP 2024] IFCap: Image-like Retrieval and Frequency-based Entity Filtering for Zero-shot Captioning☆14Updated 2 months ago
- ICCV 2023 (Oral) Open-domain Visual Entity Recognition Towards Recognizing Millions of Wikipedia Entities☆41Updated last month
- ☆8Updated 5 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆89Updated 7 months ago
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆25Updated 5 months ago
- [NeurIPS 2024] Mixture of Experts for Audio-Visual Learning☆15Updated 6 months ago
- ☆18Updated 6 months ago
- ☆52Updated last year
- Official implementation of CVPR 2024 paper "Prompt Learning via Meta-Regularization".☆27Updated 4 months ago
- ☆16Updated 2 months ago
- Visual Delta Generator with Large Multi-modal Model for Semi-supervised Composed Image Retrieval - CVPR2024☆18Updated last year
- ☆20Updated 5 months ago
- Official Repo for FoodieQA paper (EMNLP 2024)☆16Updated 3 weeks ago
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆61Updated 5 months ago
- [ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"☆51Updated 10 months ago
- SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context☆5Updated 6 months ago
- [ICLR 2025] Causal Graphical Models for Vision-Language Compositional Understanding☆9Updated 3 months ago
- [NeurIPS 2023] A faithful benchmark for vision-language compositionality☆82Updated last year
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆21Updated 6 months ago
- A simple pytorch implementation of baseline based-on CLIP for Image-text Matching.☆14Updated 2 years ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆88Updated 7 months ago
- Official Repository of Personalized Visual Instruct Tuning☆31Updated 4 months ago
- [ACL'25] Repo for paper "M-MAD: Multidimensional Multi-Agent Debate for Advanced Machine Translation Evaluation"☆12Updated 5 months ago
- ☆12Updated 3 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆31Updated last year