lxa9867 / r2bench
[ECCV 2024] R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
☆10Updated 8 months ago
Alternatives and similar repositories for r2bench:
Users that are interested in r2bench are comparing it to the libraries listed below
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆13Updated last year
- ☆11Updated 9 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆20Updated last month
- [NeurIPS 2023] Official Implementation of "PaintSeg: Painting Pixels for Training-free Segmentation"☆14Updated last year
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆16Updated 6 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆37Updated 4 months ago
- [TCSVT 2024] Temporally Consistent Referring Video Object Segmentation with Hybrid Memory☆16Updated this week
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆25Updated 5 months ago
- ☆27Updated 3 months ago
- The offical implemention of JM3D.☆29Updated last year
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024☆14Updated 6 months ago
- EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation☆9Updated last year
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆16Updated this week
- ☆13Updated 4 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆40Updated 3 months ago
- ☆41Updated 6 months ago
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)☆35Updated 2 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆33Updated 10 months ago
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated last year
- ☆31Updated this week
- [ECCV 2024] Code for Betrayed by Attention: A Simple yet Effective Approach for Self-supervised Video Object Segmentation☆33Updated last month
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- ☆59Updated last year
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆41Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆37Updated 9 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories☆29Updated last month
- ImageGen-CoT: Enhancing Text-to-Image In-context Learning with Chain-of-Thought Reasoning☆26Updated last week
- This repo contains evaluation code for the paper "AV-Odyssey: Can Your Multimodal LLMs Really Understand Audio-Visual Information?"☆23Updated 3 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆23Updated 4 months ago
- Rui Qian, Xin Yin, Dejing Dou†: Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)☆27Updated 2 weeks ago