π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"
β47Jun 16, 2024Updated last year
Alternatives and similar repositories for sesame
Users that are interested in sesame are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- UGround: Towards Unified Visual Grounding with Unrolled Transformersβ22Feb 15, 2026Updated last month
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β128Feb 20, 2025Updated last year
- Rui Qian, Xin Yin, Dejing Douβ : Reasoning to Attend: Try to Understand How <SEG> Token Works (CVPR 2025)β52Feb 4, 2026Updated last month
- Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agentsβ103Mar 10, 2026Updated 2 weeks ago
- π₯ [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospeβ¦β56Jan 22, 2026Updated 2 months ago
- Wordpress hosting with auto-scaling on Cloudways β’ AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- β33Sep 27, 2024Updated last year
- [CVPR2024] GSVA: Generalized Segmentation via Multimodal Large Language Modelsβ164Sep 12, 2024Updated last year
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervisionβ43Oct 19, 2025Updated 5 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ109May 29, 2025Updated 10 months ago
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ20Jul 20, 2024Updated last year
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.β257Feb 11, 2025Updated last year
- The official repo for "Stepping Stones: A Progressive Training Strategy for Audio-Visual Semantic Segmentation", ECCV 2024β18Oct 11, 2024Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Modelβ208Aug 5, 2024Updated last year
- This is official code for "Combating Label Distribution Shift for Active Domain Adaptation" accepted in ECCV2022β15Oct 25, 2022Updated 3 years ago
- Managed Kubernetes at scale on DigitalOcean β’ AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- π₯ [ICLR 2025] Official Benchmark Toolkits for "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β40Nov 21, 2025Updated 4 months ago
- code for the paper "CoReS: Orchestrating the Dance of Reasoning and Segmentation"β23Nov 24, 2025Updated 4 months ago
- [ICLR2025] Text4Seg: Reimagining Image Segmentation as Text Generationβ166Nov 8, 2025Updated 4 months ago
- π₯ [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"β26Feb 9, 2025Updated last year
- Open-vocabulary Semantic Segmentationβ33Feb 16, 2024Updated 2 years ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoningβ43Mar 2, 2026Updated 3 weeks ago
- T2I-Copilot: A Training-Free Multi-Agent Text-to-Image System for Enhanced Prompt Interpretation and Interactive Generation (ICCV'25)β46Oct 6, 2025Updated 5 months ago
- [ECCV 2024] SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentationβ51Mar 20, 2025Updated last year
- Video Reasoning Segmentationβ27Nov 29, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- β21Feb 10, 2025Updated last year
- [CVPR2025] Project for "HyperSeg: Towards Universal Visual Segmentation with Large Language Model".β180Dec 13, 2024Updated last year
- [CVPR 2024 π₯] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses thaβ¦β951Aug 5, 2025Updated 7 months ago
- Training, optimization and deployment of Object Detection model with dinov2 backbone for efficient inference on NVIDIA Jetsonβ13Jul 26, 2025Updated 8 months ago
- The official repo for "Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes", ECCV 2024β50Oct 12, 2025Updated 5 months ago
- Initial code for computer vision experimentsβ11Jan 1, 2023Updated 3 years ago
- [ICCV 2023] CTVIS: Consistent Training for Online Video Instance Segmentationβ81Oct 15, 2023Updated 2 years ago
- The code of 'The devil is in the labels: Semantic segmentation from sentences'.β13Nov 13, 2022Updated 3 years ago
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Referβ¦β44Dec 20, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways β’ AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generationβ77Sep 19, 2025Updated 6 months ago
- Code for the paper "Benchmarking Object Detectors with COCO: A New Path Forward."β32Jul 13, 2024Updated last year
- π₯ [NeurIPS 2024] A Cat Is A Cat (Not A Dog!): Unraveling Information Mix-ups in Text-to-Image Encoders through Causal Analysis and Embedβ¦β14Jun 21, 2025Updated 9 months ago
- β39Mar 5, 2026Updated 3 weeks ago
- β16Dec 9, 2023Updated 2 years ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"β131Aug 21, 2024Updated last year
- Activity Grammars for Temporal Action Segmentation (NeurIPS 2023)β14Jun 14, 2024Updated last year