yongliang-wu / ExploreCfg
[NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning
☆33Updated last month
Alternatives and similar repositories for ExploreCfg:
Users that are interested in ExploreCfg are comparing it to the libraries listed below
- [AAAI2025] Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient☆16Updated last month
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆16Updated 4 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆43Updated 9 months ago
- ☆97Updated last month
- ☆25Updated 6 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆36Updated 8 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆63Updated 6 months ago
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆126Updated 9 months ago
- ☆63Updated last month
- [NeurIPS 2024] Visual Perception by Large Language Model’s Weights☆35Updated 3 months ago
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆51Updated last month
- ☆25Updated 4 months ago
- LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆39Updated last month
- [Preprint] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆58Updated 2 weeks ago
- ☆15Updated last month
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆44Updated last year
- This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆58Updated 5 months ago
- The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆194Updated 9 months ago
- The official implementation of 《MLLMs-Augmented Visual-Language Representation Learning》☆31Updated 10 months ago
- ☆24Updated 8 months ago
- Official PyTorch code of "Unlocking Video-LLM via Agent-of-Thoughts Distillation".☆13Updated last month
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆44Updated 6 months ago
- Code for paper "LLMs Can Evolve Continually on Modality for X-Modal Reasoning" NeurIPS2024☆29Updated last month
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆31Updated 10 months ago
- [Preprint] Number it: Temporal Grounding Videos like Flipping Manga☆52Updated last month
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆19Updated last month
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆74Updated 9 months ago
- [TPAMI reviewing] Towards Visual Grounding: A Survey☆42Updated last week
- [NeurIPS2024] Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆137Updated last week