yilin-bao / unofficial-SiameseMAELinks
unofficial pytorch implement for Siamese-Masked Autoencoder
☆9Updated last year
Alternatives and similar repositories for unofficial-SiameseMAE
Users that are interested in unofficial-SiameseMAE are comparing it to the libraries listed below
Sorting:
- [NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion☆85Updated 2 months ago
- [NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation☆32Updated last year
- ☆23Updated last month
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation☆107Updated last week
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆17Updated last year
- [AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints☆30Updated 3 weeks ago
- Awesome video instance segmentation papers☆43Updated 3 weeks ago
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"☆72Updated 10 months ago
- [AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.☆23Updated 7 months ago
- A list of referring video object segmentation papers☆45Updated last month
- Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference☆161Updated 9 months ago
- ☆58Updated 11 months ago
- All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment☆17Updated 5 months ago
- [ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"☆82Updated 2 weeks ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆182Updated 11 months ago
- A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…☆109Updated last week
- High Quality Video Reasoning Segmentation☆31Updated 2 weeks ago
- Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation☆51Updated 11 months ago
- Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation☆52Updated 2 months ago
- [CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection☆26Updated 10 months ago
- RefDrone: A Challenging Benchmark for Drone Scene Referring Expression Comprehension☆14Updated 3 weeks ago
- [ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.☆52Updated 3 months ago
- Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relati…☆37Updated last year
- This is a PyTorch implementation of MCLN proposed by our paper "Multi-branch Collaborative Learning Network for 3D Visual Grounding"(ECCV…☆20Updated 9 months ago
- [ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs☆47Updated last week
- [AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…☆85Updated 7 months ago
- [ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection☆16Updated 3 months ago
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆27Updated 3 months ago
- [CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding☆17Updated last month
- Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"☆37Updated 4 months ago