Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Grounding"
☆52Dec 30, 2023Updated 2 years ago
Alternatives and similar repositories for ADPN-MM
Users that are interested in ADPN-MM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆21Feb 19, 2025Updated last year
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrieval☆20May 10, 2024Updated last year
- [WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection☆17Mar 23, 2025Updated last year
- [2021 MultiMedia] CONQUER: Contextual Query-aware Ranking for Video Corpus Moment Retrieval☆42Sep 23, 2021Updated 4 years ago
- UniMD: Towards Unifying Moment retrieval and temporal action Detection☆57Jul 5, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Source code of our MM'22 paper Partially Relevant Video Retrieval☆55Nov 4, 2024Updated last year
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Sep 25, 2024Updated last year
- Official Pytorch Implementation of 'BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos'☆35Feb 26, 2025Updated last year
- Chinese CLIP models with SOTA performance.☆60Aug 28, 2023Updated 2 years ago
- This is the official implementation of RGNet: A Unified Retrieval and Grounding Network for Long Videos☆19Mar 3, 2025Updated last year
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆15Aug 30, 2023Updated 2 years ago
- ☆14Dec 11, 2024Updated last year
- ☆12Mar 23, 2026Updated 3 weeks ago
- Official implement of MIA-DPO☆71Jan 23, 2025Updated last year
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- DanceCamAnimator: Keyframe-Based Controllable 3D Dance Camera Synthesis. [ACMMM 2024] Official PyTorch implementation☆39Sep 24, 2024Updated last year
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- [ECCV2024] The official implementation of "Listen to Look into the Future: Audio-Visual Egocentric Gaze Anticipation".☆13Feb 24, 2025Updated last year
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆143Aug 21, 2025Updated 7 months ago
- [AAAI 2025] VTG-LLM: Integrating Timestamp Knowledge into Video LLMs for Enhanced Video Temporal Grounding☆126Dec 10, 2024Updated last year
- [CVPR 2025, Highlight] The official implementation of the paper "Unleashing In-context Learning of Autoregressive Models for Few-shot Ima…☆27Jun 6, 2025Updated 10 months ago
- ☆11Aug 7, 2024Updated last year
- Precision Search through Multi-Style Inputs☆74Jul 30, 2025Updated 8 months ago
- 阅读顺序、Layoutreader☆19May 8, 2025Updated 11 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- An official implementation for "X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval"☆182Apr 6, 2024Updated 2 years ago
- [AAAI 2024] UniAP: Towards Universal Animal Perception in Vision via Few-shot Learning☆12Dec 10, 2023Updated 2 years ago
- SenseVoice-python: A enterprise-grade open source multi-language asr system from funasr opensource with onnxruntime☆112Oct 6, 2025Updated 6 months ago
- EIVideo- 交互式智能视频标注工具,几次鼠标点击即可解放双手,让视频标注更加轻松☆33Jul 4, 2022Updated 3 years ago
- Spatial-Temporal Knowledge-Embedded Transformer for Video Scene Graph Generation (TIP 2024, ACM MM 2023)☆20Mar 13, 2024Updated 2 years ago
- Open deep learning compiler stack for cpu, gpu and specialized accelerators☆19Apr 1, 2026Updated 2 weeks ago
- [CVPR 2024] Adapting Short-Term Transformers for Action Detection in Untrimmed Videos☆12Jun 11, 2024Updated last year
- [CVPR 2025] Official implementation of paper "Multi-Granularity Class Prototype Topology Distillation for Class-Incremental Source-Free …☆17Aug 26, 2025Updated 7 months ago
- Span-based Localizing Network for Natural Language Video Localization (ACL 2020)☆112Oct 15, 2021Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Videoshop: Localized Semantic Video Editing with Noise-Extrapolated Diffusion Inversion☆46Aug 1, 2024Updated last year
- ☆12Jan 10, 2025Updated last year
- ☆16Dec 4, 2024Updated last year
- Mental stress has become a standard part of day-to-day life. However, experiencing long-term and high-level stress affects the daily life…☆15Dec 8, 2022Updated 3 years ago
- Code for our paper: "Where's Waldo: Diffusion Features For Personalized Segmentation and Retrieval".☆14Feb 26, 2025Updated last year
- Code for HAR-GCNN: Deep Graph CNNs for Human Activity Recognition From Highly Unlabeled Mobile Sensor Data, IEEE PerCom CoMoRea 2022☆13May 9, 2022Updated 3 years ago
- [CVPR 2024] Customize your NeRF: Adaptive Source Driven 3D Scene Editing via Local-Global Iterative Training☆44Apr 13, 2024Updated 2 years ago