π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β47Jun 16, 2024Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below
Sorting:
- UGround: Towards Unified Visual Grounding with Unrolled Transformersβ21Feb 15, 2026Updated 3 weeks ago
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β51Feb 4, 2026Updated last month
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β127Feb 20, 2025Updated last year
- β33Sep 27, 2024Updated last year
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024β18Oct 11, 2024Updated last year
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervisionβ42Oct 19, 2025Updated 4 months ago
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ162Sep 12, 2024Updated last year
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ108May 29, 2025Updated 9 months ago
- Open-vocabulary Semantic Segmentationβ33Feb 16, 2024Updated 2 years ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ19Jul 20, 2024Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ205Aug 5, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β255Feb 11, 2025Updated last year
- Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agentsβ99Feb 2, 2026Updated last month
- π₯ [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospeβ¦β54Jan 22, 2026Updated last month
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".β180Dec 13, 2024Updated last year
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ163Nov 8, 2025Updated 4 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024β50Oct 12, 2025Updated 4 months ago
- Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."β32Jul 13, 2024Updated last year
- The code of 'The devil is in the labels: Semantic segmentation from sentences'.β13Nov 13, 2022Updated 3 years ago
- Training, optimization and deployment of Object Detection model with dinov2 backbone for efficient inference on NVIDIA Jetsonβ13Jul 26, 2025Updated 7 months ago
- β15Jan 12, 2026Updated last month
- This repo holds the official code and data for "Unveiling Parts Beyond Objects: Towards Finer-Granularity Referring Expression Segmentatiβ¦β72Jun 3, 2024Updated last year
- Code release for "Strike a Balance in Continual Panoptic Segmentation" (ECCV 2024)β14Mar 14, 2025Updated 11 months ago
- Initial code for computer vision experimentsβ11Jan 1, 2023Updated 3 years ago
- Video Reasoning Segmentationβ28Nov 29, 2024Updated last year
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoningβ41Mar 2, 2026Updated last week
- [ICCV 2023] CTVIS: Consistent Training for Online Video Instance Segmentationβ80Oct 15, 2023Updated 2 years ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"β269Dec 30, 2024Updated last year
- [ICCV 2025] Dynamic-VLMβ28Dec 16, 2024Updated last year
- π₯ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β39Nov 21, 2025Updated 3 months ago
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β945Aug 5, 2025Updated 7 months ago
- [ICLR'26] Official PyTorch implementation of "Time Is a Feature: Exploiting Temporal Dynamics in Diffusion Language Models".β62Feb 6, 2026Updated last month
- SpyGame: An interactive multi-agent framework to evaluate intelligence with large language models :Dβ15Nov 9, 2023Updated 2 years ago
- β16Dec 9, 2023Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β131Aug 21, 2024Updated last year
- β32Feb 29, 2024Updated 2 years ago
- [ICCV 2023 Workshop] The Official Implementation of The First Prize Solution for RVOS Competitionβ14Jan 1, 2024Updated 2 years ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generationβ76Sep 19, 2025Updated 5 months ago
- β59Sep 14, 2024Updated last year