zhaoyucs / VSDLinks
Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"
☆27Updated last year
Alternatives and similar repositories for VSD
Users that are interested in VSD are comparing it to the libraries listed below
Sorting:
- [ICCV2023] Official code for "VL-PET: Vision-and-Language Parameter-Efficient Tuning via Granularity Control"☆53Updated last year
- ☆17Updated last year
- ☆58Updated 2 years ago
- Official repository of DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models☆86Updated 11 months ago
- Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models☆37Updated last year
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated 2 months ago
- ☆25Updated 2 years ago
- [CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering☆36Updated last month
- Codes for ICLR 2025 Paper: Towards Semantic Equivalence of Tokenization in Multimodal LLM☆70Updated 3 months ago
- The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Updated last year
- ☆91Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated 2 years ago
- ☆19Updated last year
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆27Updated 2 years ago
- UniTAB: Unifying Text and Box Outputs for Grounded VL Modeling, ECCV 2022 (Oral Presentation)☆87Updated 2 years ago
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆33Updated 2 months ago
- [ICCV2023] Tem-adapter: Adapting Image-Text Pretraining for Video Question Answer☆37Updated last year
- Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
- CVPR2025: Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning☆35Updated 4 months ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆40Updated 5 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆27Updated this week
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"