qirui-chen / MultiHop-EgoQA
☆10Updated last week
Related projects: ⓘ
- ☆19Updated last month
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated last week
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆23Updated 6 months ago
- MatchTime: Towards Automatic Soccer Game Commentary Generation☆21Updated 3 weeks ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆38Updated 3 months ago
- The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆18Updated last month
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆42Updated last year
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆38Updated 2 months ago
- Composed Video Retrieval☆42Updated 4 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆33Updated 4 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 6 months ago
- ☆11Updated 2 months ago
- RaTEScore: A Metric for Entity-Aware Radiology Text Similarity☆23Updated 2 months ago
- ☆60Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆27Updated 5 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆37Updated 2 months ago
- [CVPR' 2024] Contrasting Intra-Modal and Ranking Cross-Modal Hard Negatives to Enhance Visio-Linguistic Fine-grained Understanding☆35Updated last month
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆49Updated 2 months ago
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval. Also, visualization and qb norm search for best performance…☆28Updated 5 months ago
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆68Updated last month
- ☆31Updated last year
- ☆18Updated 8 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆39Updated 3 weeks ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?"☆32Updated 2 months ago
- ☆43Updated 2 months ago
- [Arxiv] Calibrated Self-Rewarding Vision Language Models☆35Updated 3 months ago
- ☆61Updated 9 months ago
- Official Implementation of SnAG (CVPR 2024)☆32Updated 4 months ago
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Updated last year