yilin-bao / unofficial-SiameseMAELinks

unofficial pytorch implement for Siamese-Masked Autoencoder

☆9

Alternatives and similar repositories for unofficial-SiameseMAE

Users that are interested in unofficial-SiameseMAE are comparing it to the libraries listed below

Sorting:

Dmmm1997 / SimVG
[NeurIPS2024] - SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion
☆85Updated 2 months ago
RobertLuo1 / NeurIPS2023_SOC
[NeurIPS 2023] The official implementation of SOC: Semantic-Assisted Object Cluster for Referring Video Object Segmentation
☆32Updated last year
Mr-Bigworth / MMCA
☆23Updated last month
mc-lan / Text4Seg
[ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generation
☆107Updated last week
cilinyan / ReVOS-api
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆17Updated last year
Dmmm1997 / C3VG
[AAAI2025 selected as oral] - Multi-task Visual Grounding with Coarse-to-Fine Consistency Constraints
☆30Updated 3 weeks ago
fanghaook / Awesome-Video-Instance-Segmentation
Awesome video instance segmentation papers
☆43Updated 3 weeks ago
yongliu20 / SCAN
[CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"
☆72Updated 10 months ago
clownrat6 / OpenVIS
[AAAI 2025] Open-vocabulary Video Instance Segmentation Codebase built upon Detectron2, which is really easy to use.
☆23Updated 7 months ago
Tavarich / Awesome-Referring-Video-Object-Segmentation
A list of referring video object segmentation papers
☆45Updated last month
wangf3014 / SCLIP
Official implementation of SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
☆161Updated 9 months ago
jiaosiyu1999 / MAFT
☆58Updated 11 months ago
983632847 / All-in-One
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
☆17Updated 5 months ago
zhengli97 / ATPrompt
[ICCV 2025] Official PyTorch Code for "Advancing Textual Prompt Learning with Anchored Attributes"
☆82Updated 2 weeks ago
cilinyan / VISA
[ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model
☆182Updated 11 months ago
mc-lan / Awesome-MLLM-Segmentation
A curated list of publications on image and video segmentation leveraging Multimodal Large Language Models (MLLMs), highlighting state-of…
☆109Updated last week
SitongGong / VRS-HQ
High Quality Video Reasoning Segmentation
☆31Updated 2 weeks ago
HVision-NKU / Cascade-CLIP
Official implement of ICML2024 Cascade-CLIP: Cascaded Vision-Language Embeddings Alignment for Zero-Shot Semantic Segmentation
☆51Updated 11 months ago
SuleBai / SC-CLIP
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
☆52Updated 2 months ago
EdenGabriel / TaskWeave
[CVPR 2024 Accepted] TaskWeave: Decoupling and Inter-Task Feedback for Joint Moment Retrieval and Highlight Detection
☆26Updated 10 months ago
sunzc-sunny / refdrone
RefDrone: A Challenging Benchmark for Drone Scene Referring Expression Comprehension
☆14Updated 3 weeks ago
linhuixiao / HiVG
[ACM MM 2024] Hierarchical Multimodal Fine-grained Modulation for Visual Grounding.
☆52Updated 3 months ago
mlvlab / SpeaQ
Official PyTorch implementation of "Groupwise Query Specialization and Quality-Aware Multi-Assignment for Transformer-based Visual Relati…
☆37Updated last year
qzp2018 / MCLN
This is a PyTorch implementation of MCLN proposed by our paper "Multi-branch Collaborative Learning Network for 3D Visual Grounding"(ECCV…
☆20Updated 9 months ago
xmed-lab / TAM
[ICCV25 Oral] Token Activation Map to Visually Explain Multimodal LLMs
☆47Updated last week
appletea233 / AL-Ref-SAM2
[AAAI 2025] AL-Ref-SAM 2: Unleashing the Temporal-Spatial Reasoning Capacity of GPT for Training-Free Audio and Language Referenced Video…
☆85Updated 7 months ago
SooLab / CoTDet
[ICCV2023] CoTDet: Affordance Knowledge Prompting for Task Driven Object Detection
☆16Updated 3 months ago
FishAndWasabi / Real-LOD
Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"
☆27Updated 3 months ago
seilk / LocalizationHeads
[CVPR 2025 Highlight] Your Large Vision-Language Model Only Needs A Few Attention Heads For Visual Grounding
☆17Updated last month
HVision-NKU / MaskCLIPpp
Official repository of the paper "High-Quality Mask Tuning Matters for Open-Vocabulary Segmentation"
☆37Updated 4 months ago