see-say-segment / sesameLinks
π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β45Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below
Sorting:
- [NeurIPS 2024] Official PyTorch implementation of LoTLIP: Improving Language-Image Pre-training for Long Text Understandingβ46Updated 10 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervisionβ41Updated last month
- β40Updated 4 months ago
- [ECCV2024] ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inferenceβ96Updated 8 months ago
- β32Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioningβ80Updated last year
- β32Updated last year
- β58Updated 2 years ago
- Learning 1D Causal Visual Representation with De-focus Attention Networksβ35Updated last year
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ62Updated 4 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ108Updated 6 months ago
- β23Updated last year
- Official code for NeurIPS 2025 paper "GRIT: Teaching MLLMs to Think with Images"β163Updated last month
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"β46Updated last year
- [CVPR 2025 π₯]A Large Multimodal Model for Pixel-Level Visual Grounding in Videosβ91Updated 7 months ago
- (ICCV 2023) MasQCLIP for Open-Vocabulary Universal Image Segmentationβ37Updated 2 years ago
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β48Updated last month
- [CVPR 2024] The repository contains the official implementation of "Open-Vocabulary Segmentation with Semantic-Assisted Calibration"β75Updated last year
- [CVPR24] Official Implementation of GEM (Grounding Everything Module)β134Updated 7 months ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anythingβ70Updated last year
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Wantβ91Updated this week
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".β62Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Leaβ¦β98Updated last year
- [CVPR 2024 Highlight] ImageNet-Dβ46Updated last year
- [CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-trainingβ96Updated 4 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Modelβ99Updated last year
- Ref-Diff: Zero-shot Referring Image Segmentation with Generative Modelsβ19Updated 6 months ago
- [CVPR2025] SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectoriesβ81Updated 3 months ago
- Large-Vocabulary Video Instance Segmentation datasetβ95Updated last year
- (ECCV 2024) Can OOD Object Detectors Learn from Foundation Models?β25Updated last year