see-say-segment / sesameLinks
π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β41Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below
Sorting:
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ41Updated 3 months ago
- β31Updated 8 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ95Updated 3 weeks ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inferenceβ84Updated 2 months ago
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentationβ91Updated 2 months ago
- β23Updated last year
- [NeurIPS'24] Unleashing the Potential of the Diffusion Model in Few-shot Semantic Segmentation (Diffews)β38Updated 2 months ago
- cliptraseβ35Updated 9 months ago
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β35Updated last month
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingβ43Updated 5 months ago
- β32Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β41Updated 6 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examplesβ23Updated 6 months ago
- [CVPR 2025] Few-shot Recognition via Stage-Wise Retrieval-Augmented Finetuningβ19Updated this week
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"β72Updated 9 months ago
- [CVPR 2024 Highlight] ImageNet-Dβ43Updated 8 months ago
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?β25Updated 6 months ago
- β58Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ77Updated 8 months ago
- PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"β34Updated last year
- π₯ Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospective Resamplinβ¦β35Updated last week
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Modelsβ16Updated 3 weeks ago
- β31Updated 5 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ51Updated 5 months ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentationβ38Updated last year
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objectsβ48Updated 9 months ago
- [NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videosβ122Updated 5 months ago
- Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentationβ48Updated last month
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectoriesβ54Updated 3 months ago
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)β46Updated 2 months ago