π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β46Jun 16, 2024Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- UGround: Towards Unified Visual Grounding with Unrolled Transformersβ22Feb 15, 2026Updated 2 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β128Feb 20, 2025Updated last year
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β53Feb 4, 2026Updated 2 months ago
- Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agentsβ105Mar 10, 2026Updated last month
- π₯ [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospeβ¦β56Jan 22, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling - Free Trial β’ AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- β33Sep 27, 2024Updated last year
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ166Sep 12, 2024Updated last year
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervisionβ43Oct 19, 2025Updated 5 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ109May 29, 2025Updated 10 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ21Jul 20, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β260Feb 11, 2025Updated last year
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024β18Oct 11, 2024Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ209Aug 5, 2024Updated last year
- π₯ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β41Nov 21, 2025Updated 4 months ago
- Virtual machines for every use case on DigitalOcean β’ AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"β23Nov 24, 2025Updated 4 months ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ168Nov 8, 2025Updated 5 months ago
- π₯ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β26Feb 9, 2025Updated last year
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoningβ43Mar 2, 2026Updated last month
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)β48Oct 6, 2025Updated 6 months ago
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentationβ51Mar 20, 2025Updated last year
- Video Reasoning Segmentationβ27Nov 29, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β951Aug 5, 2025Updated 8 months ago
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".β180Dec 13, 2024Updated last year
- Managed Database hosting by DigitalOcean β’ AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Training, optimization and deployment of Object Detection model with dinov2 backbone for efficient inference on NVIDIA Jetsonβ13Jul 26, 2025Updated 8 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024β51Oct 12, 2025Updated 6 months ago
- Initial code for computer vision experimentsβ11Jan 1, 2023Updated 3 years ago
- [ICCV 2023] CTVIS: Consistent Training for Online Video Instance Segmentationβ81Oct 15, 2023Updated 2 years ago
- The code of 'The devil is in the labels: Semantic segmentation from sentences'.β13Nov 13, 2022Updated 3 years ago
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referβ¦β45Dec 20, 2023Updated 2 years ago
- Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."β32Jul 13, 2024Updated last year
- π₯ [NeurIPS 2024] A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedβ¦β14Jun 21, 2025Updated 9 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generationβ78Sep 19, 2025Updated 7 months ago
- 1-Click AI Models by DigitalOcean Gradient β’ AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- β39Mar 5, 2026Updated last month
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β131Aug 21, 2024Updated last year
- β16Dec 9, 2023Updated 2 years ago
- LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoningβ195Apr 16, 2024Updated 2 years ago
- Activity Grammars for Temporal Action Segmentation (NeurIPS 2023)β14Jun 14, 2024Updated last year
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"β270Dec 30, 2024Updated last year
- β15Apr 6, 2026Updated last week