gyhandy / Text2Image-for-Detection
DALL-E for Detection: Language-driven Compositional Image Synthesis for Object Detection
☆17Updated 11 months ago
Related projects: ⓘ
- [ECCV 2024] This is the official implementation of "Stitched ViTs are Flexible Vision Backbones".☆22Updated 7 months ago
- Official implementation of ImprovingText-guided ObjectInpainting with SemanticPre-inpainting in ECCV 2024☆20Updated 2 months ago
- ☆16Updated 2 months ago
- 🔥 [CVPR 2024] Official implementation of "See, Say, and Segment: Teaching LMMs to Overcome False Premises (SESAME)"☆23Updated 3 months ago
- ☆52Updated last year
- ☆19Updated last year
- ☆27Updated 5 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆35Updated last year
- DiverGen (CVPR 2024) & BSGAL (ICML 2024)☆33Updated 3 weeks ago
- ☆36Updated 4 months ago
- Code for paper "Unsegment Anything by Simulating Deformation" (CVPR 2024)☆21Updated 3 months ago
- ☆17Updated last year
- Code for Point-Level Regin Contrast (https//arxiv.org/abs/2202.04639)☆32Updated last year
- ☆17Updated last week
- ☆57Updated last year
- ☆20Updated 9 months ago
- Video Diffusion State Space Models☆19Updated 5 months ago
- ☆17Updated 5 months ago
- Sambor: Boosting Segment Anything Model Towards Open-Vocabulary Learning☆30Updated 9 months ago
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆22Updated last week
- Official code repo of PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs☆22Updated 3 months ago
- REVO-LION: Evaluating and Refining Vision-Language Instruction Tuning Datasets☆11Updated 11 months ago
- [ECCV 2024 Oral] ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction☆29Updated last month
- ☆35Updated last year
- MIMIC: Masked Image Modeling with Image Correspondences☆15Updated 3 months ago
- Official implementation of "Interpreting and Controlling Vision Foundation Models via Text Explanations"☆12Updated 3 months ago
- [ECCV'24] MaxFusion: Plug & Play multimodal generation in text to image diffusion models☆17Updated last month
- ☆32Updated 8 months ago
- ImaginaryNet: Learning Object Detectors without Real Images and Annotations☆24Updated last year
- Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training☆15Updated last year