see-say-segment / sesameLinks
π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β43Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below
Sorting:
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ79Updated 10 months ago
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingβ43Updated 8 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ40Updated 5 months ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inferenceβ89Updated 5 months ago
- β58Updated 2 years ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentationβ38Updated last year
- β32Updated 11 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ103Updated 3 months ago
- Learning 1D Causal Visual Representation with De-focus Attention Networksβ35Updated last year
- β32Updated last year
- β38Updated 2 months ago
- β23Updated last year
- [ECCV2024] ProxyCLIP: Proxy Attention Improves CLIP for Open-Vocabulary Segmentationβ103Updated 5 months ago
- [CVPR 2024 Highlight] ImageNet-Dβ43Updated 11 months ago
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β42Updated 9 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ59Updated last month
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"β73Updated 11 months ago
- Official code for paper "GRIT: Teaching MLLMs to Think with Images"β128Updated last month
- [CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.β29Updated last year
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β42Updated 2 weeks ago
- [CVPR 2025] Test-Time Visual In-Context Tuningβ25Updated 5 months ago
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objectsβ51Updated 11 months ago
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Leaβ¦β98Updated last year
- Official repository of paper "Subobject-level Image Tokenization" (ICML-25)β85Updated 2 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Wantβ87Updated 3 months ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examplesβ30Updated 9 months ago
- [ECCV'24] Official PyTorch implementation of In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentationβ46Updated 11 months ago
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"β20Updated 10 months ago
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Modelsβ18Updated 3 months ago
- cliptraseβ45Updated last year