MCG-NJU / Dynamic-MDETR
[TPAMI 2024] Dynamic MDETR: A Dynamic Multimodal Transformer Decoder for Visual Grounding
☆24Updated 4 months ago
Alternatives and similar repositories for Dynamic-MDETR:
Users that are interested in Dynamic-MDETR are comparing it to the libraries listed below
- [ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.☆35Updated last week
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentati…☆63Updated 7 months ago
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆45Updated 7 months ago
- This repo holds the official code and data for "Beyond Literal Descriptions: Understanding and Locating Open-World Objects Aligned with H…☆17Updated 8 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024☆31Updated last month
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆29Updated 10 months ago
- UniMD: Towards Unifying Moment retrieval and temporal action Detection☆41Updated 6 months ago
- ☆34Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆115Updated last week
- A list of referring video object segmentation papers☆23Updated this week
- [BMVC 2023] Zero-shot Composed Text-Image Retrieval☆51Updated 2 months ago
- The official repository for ICLR2024 paper "FROSTER: Frozen CLIP is a Strong Teacher for Open-Vocabulary Action Recognition"☆66Updated 2 weeks ago
- Composed Video Retrieval☆49Updated 8 months ago
- [T-PAMI 2023] Temporal Perceiver: A General Architecture for Arbitrary Boundary Detection☆35Updated last year
- [CVPR 2024] Do you remember? Dense Video Captioning with Cross-Modal Memory Retrieval☆48Updated 7 months ago
- [CVPR 2024] Official PyTorch implementation of the paper "One For All: Video Conversation is Feasible Without Video Instruction Tuning"☆29Updated 11 months ago
- A Versatile Video-LLM for Long and Short Video Understanding with Superior Temporal Localization Ability☆78Updated 2 months ago
- [AAAI 2024] AVSegFormer: Audio-Visual Segmentation with Transformer☆60Updated last month
- ☆37Updated 9 months ago
- ICLR‘24 Offical Implementation of Composed Image Retrieval with Text Feedback via Multi-grained Uncertainty Regularization☆69Updated last year
- Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023☆48Updated 11 months ago
- Official repo for our ICML 23 paper: "Multi-Modal Classifiers for Open-Vocabulary Object Detection"☆87Updated last year
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆62Updated 7 months ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆66Updated 3 months ago
- Official code for WACV 2024 paper, "Annotation-free Audio-Visual Segmentation"☆28Updated 3 months ago
- [CVPR 2024 Highlight] Official implementation of the paper: Cooperation Does Matter: Exploring Multi-Order Bilateral Relations for Audio-…☆38Updated 6 months ago
- ☆28Updated last year
- The official repository for paper "PruneVid: Visual Token Pruning for Efficient Video Large Language Models".☆28Updated 2 weeks ago
- (CVPR2024) MeaCap: Memory-Augmented Zero-shot Image Captioning☆43Updated 5 months ago
- ☆23Updated 3 months ago