see-say-segment / sesameLinks
π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β43Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below
Sorting:
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ40Updated 6 months ago
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ79Updated 11 months ago
- β39Updated 2 months ago
- β58Updated 2 years ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingβ45Updated 8 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ60Updated 2 months ago
- β32Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ104Updated 4 months ago
- [CVPR 2024 Highlight] ImageNet-Dβ43Updated 11 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examplesβ29Updated 10 months ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentationβ38Updated last year
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inferenceβ90Updated 6 months ago
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Modelsβ19Updated 4 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β44Updated 10 months ago
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)β129Updated 5 months ago
- β32Updated last year
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentationβ107Updated 6 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"β20Updated 11 months ago
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objectsβ51Updated last year
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β43Updated last month
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"β75Updated last year
- Large-Vocabulary Video Instance Segmentation datasetβ95Updated last year
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.β29Updated last year
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"β143Updated 2 months ago
- β23Updated last year
- Learning 1D Causal Visual Representation with De-focus Attention Networksβ35Updated last year
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMsβ27Updated 8 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Leaβ¦β98Updated last year
- Official repository of paper "Subobject-level Image Tokenization" (ICML-25)β87Updated 3 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentationβ47Updated last year