MaureenZOU / detectron2-xyz
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
β18Updated 2 years ago
Alternatives and similar repositories for detectron2-xyz:
Users that are interested in detectron2-xyz are comparing it to the libraries listed below
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Modelβ93Updated 7 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLMβ20Updated 2 months ago
- π₯ [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"β33Updated 8 months ago
- Official repository of paper "Subobject-level Image Tokenization"β65Updated 10 months ago
- [CVPR2025] Code Release of F-LMM: Grounding Frozen Large Multimodal Modelsβ68Updated 7 months ago
- Code release for "SegLLM: Multi-round Reasoning Segmentation"β69Updated 2 weeks ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool woβ¦β29Updated 5 months ago
- Detectron2 Toolbox and Benchmark for V3Detβ16Updated 9 months ago
- β19Updated last year
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Wantβ66Updated last month
- β29Updated 11 months ago
- β33Updated last year
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effectβ¦β35Updated 8 months ago
- This repository is for the first survey on SAM for videos.β32Updated last month
- β58Updated last year
- [CVPR 2025] DynRefer: Delving into Region-level Multimodal Tasks via Dynamic Resolutionβ44Updated this week
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Modelβ42Updated 7 months ago
- [ICCV2023] EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understandingβ75Updated last year
- [EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"β15Updated 4 months ago
- β58Updated last year
- [ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Leaβ¦β97Updated 10 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Textβ64Updated 6 months ago
- [AAAI2025] ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialoguesβ53Updated 2 months ago
- β104Updated 8 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignmentβ41Updated 2 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridgesβ65Updated last week
- β29Updated 5 months ago
- Training code for CLIP-FlanT5β24Updated 7 months ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervisionβ32Updated 4 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Detβ15Updated 11 months ago