callsys / ControlCap
[ECCV 2024] ControlCap: Controllable Region-level Captioning
☆49Updated last month
Related projects: ⓘ
- OVAD: Open-vocabulary Attribute Detection code☆28Updated last year
- ☆32Updated 5 months ago
- (ICCV 2023) Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation☆43Updated 2 months ago
- Code Release of F-LMM: Grounding Frozen Large Multimodal Models☆35Updated last month
- IDA-VLM: Towards Movie Understanding via ID-Aware Large Vision-Language Model☆18Updated last week
- [CVPR2024 Highlight] Official repository of the paper "The devil is in the fine-grained details: Evaluating open-vocabulary object detect…☆39Updated last month
- Official Pytorch codebase for Open-Vocabulary Instance Segmentation without Manual Mask Annotations [CVPR 2023]☆47Updated 9 months ago
- ☆56Updated last year
- ☆57Updated last year
- Accelerating Vision-Language Pretraining with Free Language Modeling (CVPR 2023)☆31Updated last year
- [CVPR 2023] RILS: Masked Visual Reconstruction in Language Semantic Space (https://arxiv.org/abs/2301.06958)☆43Updated last year
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆36Updated last year
- Implementation of paper 'Helping Hands: An Object-Aware Ego-Centric Video Recognition Model'☆30Updated 10 months ago
- Obj2Seq: Formatting Objects as Sequences with Class Prompt for Visual Tasks (NeurIPS2022)☆84Updated last year
- Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆58Updated 3 months ago
- ☆20Updated last year
- Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling @ CVPR22☆42Updated last year
- [ICCV 2023] PyTorch implementation of RandBox☆51Updated 10 months ago
- [CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.☆28Updated last year
- ICCV2023: Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning☆35Updated 11 months ago
- Lumen: a Large multimodal model with versatile vision-centric capabilities☆16Updated 3 months ago
- Code for the paper "Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundatio…☆22Updated 10 months ago
- [ECCV2024] Learning Video Context as Interleaved Multimodal Sequences☆17Updated 3 weeks ago
- Simple PyTorch implementation of "Libra: Building Decoupled Vision System on Large Language Models" (accepted by ICML 2024)☆41Updated 3 months ago
- [AAAI2024] Code Release of CLIM: Contrastive Language-Image Mosaic for Region Representation☆25Updated 7 months ago
- ☆12Updated 9 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆103Updated 3 weeks ago
- [CVPR2024] The code of "UniPT: Universal Parallel Tuning for Transfer Learning with Efficient Parameter and Memory"☆62Updated 4 months ago
- ☆58Updated this week
- ☆45Updated last year