jyrao / MatchTime
MatchTime: Towards Automatic Soccer Game Commentary Generation
☆21Updated 3 weeks ago
Related projects: ⓘ
- ☆10Updated last week
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated last week
- The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆18Updated last month
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆75Updated 6 months ago
- Large Language Models are Temporal and Causal Reasoners for Video Question Answering (EMNLP 2023)☆71Updated last month
- ☆19Updated last month
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆68Updated last month
- Official Implementation of "The Surprising Effectiveness of Multimodal Large Language Models for Video Moment Retrieval"☆34Updated last month
- ☆16Updated this week
- Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs".☆39Updated 3 weeks ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆70Updated 2 weeks ago
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆42Updated last year
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆23Updated 6 months ago
- VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆73Updated 2 months ago
- Repo for the paper `ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models'☆44Updated 3 weeks ago
- A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models!☆114Updated 8 months ago
- RaTEScore: A Metric for Entity-Aware Radiology Text Similarity☆23Updated 2 months ago
- Official implementation of "Why are Visually-Grounded Language Models Bad at Image Classification?"☆32Updated 2 months ago
- FreeVA: Offline MLLM as Training-Free Video Assistant☆42Updated 3 months ago
- [Preprint] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding☆49Updated last month
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆27Updated 5 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆37Updated 2 months ago
- ☆27Updated 7 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆38Updated 2 months ago
- [CVPR 2024] Bridging the Gap: A Unified Video Comprehension Framework for Moment Retrieval and Highlight Detection☆66Updated 2 months ago
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆33Updated 4 months ago
- Official PyTorch implementation of the paper "Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner"☆15Updated last year
- HallE-Control: Controlling Object Hallucination in LMMs☆24Updated 5 months ago
- [ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds☆80Updated 2 months ago
- 🌀 R^2-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding (ECCV 2024)☆52Updated 2 months ago