mlvlab / vid-TLDR
Official implementation of CVPR 2024 paper "vid-TLDR: Training Free Token merging for Light-weight Video Transformer".
☆32Updated 4 months ago
Related projects: ⓘ
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆35Updated 11 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆38Updated 2 months ago
- Official PyTorch code of "Grounded Question-Answering in Long Egocentric Videos", accepted by CVPR 2024.☆49Updated this week
- Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…☆18Updated 4 months ago
- ☆25Updated last year
- ☆25Updated last year
- Code for paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"☆68Updated last month
- Distribution-Aware Prompt Tuning for Vision-Language Models (ICCV 2023)☆36Updated 9 months ago
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆22Updated 3 months ago
- ☆30Updated 9 months ago
- ☆34Updated 5 months ago
- ☆21Updated last year
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆104Updated last year
- Official Pytorch implementation of "Test-Time Zero-Shot Temporal Action Localization", CVPR 2024☆38Updated last week
- Composed Video Retrieval☆42Updated 4 months ago
- Official Implementation of SnAG (CVPR 2024)☆32Updated 4 months ago
- Video-Text Representation Learning via Differentiable Weak Temporal Alignment (CVPR 2022)☆14Updated 5 months ago
- Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)☆29Updated 8 months ago
- ☆43Updated 2 months ago
- CVPR 2023 Accepted Paper HOICLIP: Efficient Knowledge Transfer for HOI Detection with Vision-Language Models☆52Updated 6 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆49Updated last month
- Official implementation of the paper "Boosting Human-Object Interaction Detection with Text-to-Image Diffusion Model"☆42Updated last year
- [AAAI2023] Symbolic Replay: Scene Graph as Prompt for Continual Learning on VQA Task (Oral)☆36Updated 5 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆38Updated 3 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- [NeurIPS 2022 Spotlight] RLIP: Relational Language-Image Pre-training and a series of other methods to solve HOI detection and Scene Grap…☆71Updated 3 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆35Updated last month
- ☆45Updated last year
- Official pytorch repository for "Knowing Where to Focus: Event-aware Transformer for Video Grounding" (ICCV 2023)☆46Updated last year
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆50Updated 3 months ago