γAccepted by ACM MM'25 ππγMS-DETR: Towards Effective Video Moment Retrieval and Highlight Detection by Joint Motion-Semantic Learning
β38Sep 26, 2025Updated 6 months ago
Alternatives and similar repositories for MS-DETR
Users that are interested in MS-DETR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Modelsβ24Jan 1, 2026Updated 2 months ago
- Repo for paper "MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding".β39Jun 9, 2025Updated 9 months ago
- Proposed fuzzy reward model with GRPO to improve VLM's abilities in crowd counting task.β21Apr 11, 2025Updated 11 months ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"β25Feb 2, 2025Updated last year
- β14Oct 30, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- (CVPR25) Exploring Contextual Attribute Density in Referring Expression Countingβ18Dec 3, 2025Updated 3 months ago
- Decoupled Memory Selection for Multi-target Video Segmentation of SAM3β40Jan 16, 2026Updated 2 months ago
- [AAAI 2024] GMMFormer: Gaussian-Mixture-Model Based Transformer for Efficient Partially Relevant Video Retrievalβ20May 10, 2024Updated last year
- [NeurIPS 2024] OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling.β31Nov 13, 2025Updated 4 months ago
- ICCV'23 Dual Learning with Dynamic Knowledge Distillation for Partially Relevant Video Retrievalβ19Aug 22, 2025Updated 7 months ago
- Segmentation assisted U-shaped multi-scale transformer for crowd countingβ22Jun 9, 2024Updated last year
- code for GuidedNetβ13Feb 16, 2023Updated 3 years ago
- [EMNLP 2025 Industry] Datasets and Recipes for Video Temporal Grounding via Reinforcement Learningβ36Oct 22, 2025Updated 5 months ago
- [CVPR 2023] HierVL Learning Hierarchical Video-Language Embeddingsβ46Aug 14, 2023Updated 2 years ago
- GPU virtual machines on DigitalOcean Gradient AI β’ AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- M-SpecGene: Generalized Foundation Model for RGBT Multispectral Vision (ICCV 2025)β31Nov 19, 2025Updated 4 months ago
- β25Mar 12, 2026Updated 2 weeks ago
- The ODinMJ RGB-T dataset is an object detection RGB-T dataset for mountain jungle scenes.β29May 29, 2024Updated last year
- β49Sep 13, 2024Updated last year
- Official Implementation of SnAG (CVPR 2024)β57Apr 26, 2025Updated 11 months ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Groundingβ83Dec 14, 2025Updated 3 months ago
- Model for the manuscript named "Spectral Response Function Guided Deep Optimization-driven Network for Spectral Super-resolution" pbulishβ¦β16Feb 1, 2021Updated 5 years ago
- Dataset & Code for ACM Multimedia 2023 paper. "SemanticRT: A Large-Scale Dataset and Method for Robust Semantic Segmentation in Multispecβ¦β15Apr 14, 2025Updated 11 months ago
- Pytorch implementation of the paper 'Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Superβ¦β20Jan 19, 2024Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- RGBD Pretraining code used in DFormer [ICLR 2024]β21Jul 8, 2025Updated 8 months ago
- [AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraintsβ44Jul 2, 2025Updated 8 months ago
- PyTorch implementation for our ICLR 2025 paper State Space Model Meets Transformer: A New Paradigm for 3D Object Detectionβ41Mar 27, 2025Updated last year
- This repository is related to 'Intriguing Properties of Hyperbolic Embeddings in Vision-Language Models', published at TMLR (2024), httpsβ¦β22Jul 5, 2024Updated last year
- Code for "A general image fusion framework using multi-task semi-supervised learning"β22Aug 10, 2024Updated last year
- batchboost is a variation on MixUp that instead of mixing just two images, mixes many images together.β44Jan 26, 2020Updated 6 years ago
- FineMotion: A Dataset and Benchmark with both Spatial and Temporal Annotation for Fine-grained Motion Generation and Editingβ17Mar 4, 2025Updated last year
- Codes of Interpreting Low-level Vision Models with Causal Effect Mapsβ34Sep 9, 2025Updated 6 months ago
- Differentiable Hierarchical Visual Tokenizationβ43Nov 26, 2025Updated 4 months ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [WACV 2023] A Simple and Powerful Global Optimization for Unsupervised Video Object Segmentationβ23Dec 2, 2022Updated 3 years ago
- β10Jan 6, 2020Updated 6 years ago
- β13Mar 10, 2018Updated 8 years ago
- η η₯η§η ε©ζβ114Mar 21, 2026Updated last week
- a practicable Pytorch framework used in Deep Learning.β25Feb 27, 2025Updated last year
- Official pytorch repository for CG-DETR "Correlation-guided Query-Dependency Calibration in Video Representation Learning for Temporal Grβ¦β153Aug 21, 2024Updated last year
- This is the official code for NeurIPS 2023 paper "Learning Unseen Modality Interaction"β18Jan 22, 2024Updated 2 years ago