minghu0830 / NurViD-benchmark
☆21Updated last year
Alternatives and similar repositories for NurViD-benchmark:
Users that are interested in NurViD-benchmark are comparing it to the libraries listed below
- ☆77Updated last year
- Official PyTorch code of GroundVQA (CVPR'24)☆58Updated 6 months ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆75Updated 2 months ago
- (CVPR 2023) Official implemention of the paper "Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos…☆29Updated last year
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆37Updated 3 months ago
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆28Updated 2 weeks ago
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆15Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆41Updated last year
- [COLING'25] HGCLIP: Exploring Vision-Language Models with Graph Representations for Hierarchical Understanding☆38Updated 4 months ago
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆53Updated 9 months ago
- Official repository for "Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting" [CVPR 2023]☆116Updated last year
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆48Updated 11 months ago
- [NeurIPS 2023] Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization☆103Updated last year
- [AAAI'25]: Building a Multi-modal Spatiotemporal Expert for Zero-shot Action Recognition with CLIP☆10Updated last month
- [AAAI 2024] DGL: Dynamic Global-Local Prompt Tuning for Text-Video Retrieval.☆39Updated 5 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"☆33Updated last year
- [CVPR 2025] PyTorch implementation of paper "FLAME: Frozen Large Language Models Enable Data-Efficient Language-Image Pre-training"☆27Updated last week
- ☆36Updated 2 years ago
- [AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos☆23Updated 6 months ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆44Updated 7 months ago
- Disentangled Pre-training for Human-Object Interaction Detection☆19Updated 4 months ago
- BEAR: a new BEnchmark on video Action Recognition☆42Updated 11 months ago
- ☆29Updated last year
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆67Updated 5 months ago
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆97Updated last year
- ☆19Updated 5 months ago
- Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023☆51Updated last year
- ☆23Updated 2 years ago
- Official code for CVPR 2024 paper, "Audio-Visual Segmentation via Unlabeled Frame Exploitation""☆12Updated 8 months ago
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆81Updated last year