Jiaxuan-Li / EVCap
[CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension
☆34Updated 7 months ago
Related projects ⓘ
Alternatives and complementary repositories for EVCap
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆37Updated 3 months ago
- Source code of our CVPR2024 paper TeachCLIP for Text-to-Video Retrieval☆19Updated 3 weeks ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆41Updated 3 months ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆58Updated 4 months ago
- NegCLIP.☆26Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆45Updated 5 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆51Updated 2 months ago
- [NeurIPS 2024] Calibrated Self-Rewarding Vision Language Models☆43Updated 5 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆39Updated 4 months ago
- HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)☆41Updated 4 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU☆41Updated last year
- Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)☆73Updated 3 months ago
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆95Updated 9 months ago
- [ICML2024] Repo for the paper `Evaluating and Analyzing Relationship Hallucinations in Large Vision-Language Models'☆20Updated last month
- 【ICLR 2024, Spotlight】Sentence-level Prompts Benefit Composed Image Retrieval☆68Updated 7 months ago
- ☆24Updated 4 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆34Updated 6 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆28Updated 7 months ago
- ☆27Updated last year
- ☆31Updated last month
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆44Updated last year
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆28Updated 8 months ago
- Official github repo for ICCV2023 paper 'Multi-event Video-Text Retrieval'☆18Updated 9 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆48Updated 5 months ago
- [ECCV 2024] EgoCVR: An Egocentric Benchmark for Fine-Grained Composed Video Retrieval☆27Updated 2 months ago
- [EMNLP'23] The official GitHub page for ''Evaluating Object Hallucination in Large Vision-Language Models''☆73Updated 7 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆68Updated 6 months ago
- [ICLR 2024] Analyzing and Mitigating Object Hallucination in Large Vision-Language Models☆135Updated 6 months ago
- Composed Video Retrieval☆45Updated 6 months ago