lxa9867 / r2bench
[ECCV 2024] R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations
☆10Updated 9 months ago
Alternatives and similar repositories for r2bench:
Users that are interested in r2bench are comparing it to the libraries listed below
- [CVPR 2024] "Towards Robust Audiovisual Segmentation in Complex Environments with Quantization-based Semantic Decomposition"☆13Updated last year
- ☆11Updated 10 months ago
- [NeurIPS 2023] Official Implementation of "PaintSeg: Painting Pixels for Training-free Segmentation"☆14Updated last year
- ☆35Updated this week
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆39Updated 5 months ago
- [EMNLP 2024] Preserving Multi-Modal Capabilities of Pre-trained VLMs for Improving Vision-Linguistic Compositionality☆16Updated 6 months ago
- ☆28Updated 3 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆34Updated 10 months ago
- This is the project for 'USG'.☆11Updated 3 weeks ago
- ☆61Updated last year
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)☆35Updated 3 weeks ago
- Can 3D Vision-Language Models Truly Understand Natural Language?☆21Updated last year
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆12Updated last month
- ☆41Updated 7 months ago
- The offical implemention of JM3D.☆30Updated last week
- Official code for the paper, "TaCA: Upgrading Your Visual Foundation Model with Task-agnostic Compatible Adapter".☆16Updated last year
- ☆14Updated 3 weeks ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆43Updated 3 months ago
- EPCFormer: Expression Prompt Collaboration Transformer for Universal Referring Video Object Segmentation☆9Updated last year
- ☆21Updated last year
- ☆19Updated 2 weeks ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆37Updated 10 months ago
- ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation (CVPR'25)☆11Updated last month
- (NeurIPS 2024) What Makes CLIP More Robust to Long-Tailed Pre-Training Data? A Controlled Study for Transferable Insights☆25Updated 6 months ago
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆35Updated last year
- [ECCV 2024] Official implementation of the paper "Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning…☆27Updated 2 months ago
- ☆30Updated 3 weeks ago
- ☆9Updated 11 months ago
- FleVRS: Towards Flexible Visual Relationship Segmentation, NeurIPS 2024☆20Updated 4 months ago
- A framework named B^2-DiffuRL for RL-based diffusion model fine-tuning.☆29Updated last month