MCG-NJU / SAM2-PlusView external linksLinks
SAM 2++: Tracking Anything at Any Granularity
☆53Dec 15, 2025Updated last month
Alternatives and similar repositories for SAM2-Plus
Users that are interested in SAM2-Plus are comparing it to the libraries listed below
Sorting:
- [ICCV'25] Official PyTorch Implementation of "JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers"☆27Nov 27, 2025Updated 2 months ago
- WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning☆48Dec 30, 2025Updated last month
- ☆22Mar 7, 2025Updated 11 months ago
- An End-to-End Pipeline for Enhanced French Text-to-Speech with SSML Prosody Control☆30Jan 13, 2026Updated last month
- [ICCV2023] Isomer: Isomerous Transformer for Zero-Shot Video Object Segmentation☆30Nov 21, 2023Updated 2 years ago
- Robust Referring Video Object Segmentation with Cyclic Structural Consistency [ICCV 2023]☆30Mar 13, 2024Updated last year
- FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding. (WACV2025)☆34Apr 17, 2025Updated 9 months ago
- The official implementation of our work Hawkeye: Discovering and Grounding Implicit Anomalous Sentiment in Recon-videos via Scene-enhanc…☆12Oct 14, 2024Updated last year
- Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"☆53Jul 5, 2025Updated 7 months ago
- Finetuning & extending DiffusionDet to video & pedestrian multi-object-tracking☆13Apr 12, 2023Updated 2 years ago
- [ECCV 2022] Tackling Background Distraction in Video Object Segmentation☆39Jun 2, 2025Updated 8 months ago
- Continual Resilient (CoRe) Optimizer for PyTorch☆11Jun 10, 2024Updated last year
- DisTime: Distribution-based Time Representation for Video Large Language Models.☆18Jul 10, 2025Updated 7 months ago
- The repository of VG-Refiner paper☆17Dec 9, 2025Updated 2 months ago
- [ICLR 2025] IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆37Nov 27, 2024Updated last year
- ☆12Jun 17, 2019Updated 6 years ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- ☆10Apr 7, 2025Updated 10 months ago
- Code for the paper "IFFNeRF: 6D Pose Estimation from a Single Image and a 3D Gaussian Splatting Model"☆12May 26, 2024Updated last year
- A news based stock scalper using LLM and quant approach☆14Jan 16, 2025Updated last year
- ☆15Sep 16, 2024Updated last year
- This library implements functions and classes for mesh registration, data augmentation, and data normalisation.☆11Oct 7, 2024Updated last year
- Code for the paper "No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations"☆12Oct 31, 2024Updated last year
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆37Oct 9, 2025Updated 4 months ago
- Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance☆13Nov 27, 2025Updated 2 months ago
- Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"☆20Nov 1, 2025Updated 3 months ago
- Awesome latest models, datasets and benchmarks on streaming/online video understanding.☆23Oct 19, 2025Updated 3 months ago
- Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement☆10Jan 24, 2022Updated 4 years ago
- MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations☆31Oct 15, 2025Updated 3 months ago
- non-rigid registration in NIMBLE: A Non-rigid Hand Model with Bones and Muscles☆11Sep 2, 2022Updated 3 years ago
- 完整基于omlsa.m实现☆14Nov 26, 2021Updated 4 years ago
- Official Code for CVPR2025 Paper: LatentHOI: On the Generalizable Hand Object Motion Generation with Latent Hand Diffusion☆28Jan 15, 2026Updated 3 weeks ago
- Aggregate and Discriminate: Pseudo Clips-Guided Boundary Perception for Video Moment Retrieval☆12Nov 25, 2024Updated last year
- ☆11Jun 6, 2022Updated 3 years ago
- GHUStereo models are novel real-time stereo matching architectures with a low computation complexity characterized by compact cost volum…☆29Dec 14, 2025Updated last month
- An unofficial implementation of Lite-RTSE, a cost-effective lite model for real-time speech enhancement☆14Nov 19, 2023Updated 2 years ago
- Official Implementation for ACM MM2024 paper "VrdONE: One-stage Video Visual Relation Detection".☆11Nov 13, 2024Updated last year
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆12Feb 27, 2024Updated last year
- Agentic Keyframe Search for Video Question Answering☆15Apr 7, 2025Updated 10 months ago