taolinzhang / 3DVLPLinks
[AAAI2024] An official pytorch implement of the paper: Vision-Language Pre-training with Object Contrastive Learning for 3D Scene Understanding
☆13Updated last year
Alternatives and similar repositories for 3DVLP
Users that are interested in 3DVLP are comparing it to the libraries listed below
Sorting:
- [ICLR'25] Do Egocentric Video-Language Models Truly Understand Hand-Object Interactions?☆12Updated 10 months ago
- ☆22Updated last year
- [ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding☆23Updated 11 months ago
- [IEEE TPAMI-2024] Pair then Relation: Pair-Net for Panoptic Scene Graph Generation☆99Updated last year
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆47Updated last year
- [AAAI 2024] The official implementation of the paper "3D-STMN: Dependency-Driven Superpoint-Text Matching Network for End-to-End 3D Refer…☆44Updated 2 years ago
- Official implementation of "A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives", accepted at CVPR 2…☆24Updated last year
- [CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"☆36Updated last year
- An official repo for WACV 2025 paper "LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spa…☆26Updated last year
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆49Updated last year
- [IJCV 2025] VLPrompt-PSG: Vision-Language Prompting for Panoptic Scene Graph Generation☆28Updated last year
- ☆41Updated 8 months ago
- TrackGPT: Track What You Need in Videos via Text Prompts☆25Updated 2 years ago
- Repository for the paper: Teaching VLMs to Localize Specific Objects from In-context Examples☆40Updated last year
- The offical implemention of JM3D.☆31Updated 5 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated 2 years ago
- The official repo of our work "Pensieve: Retrospect-then-Compare mitigates Visual Hallucination"☆15Updated last year
- This is the project for 'USG'.☆35Updated 10 months ago
- ☆21Updated 9 months ago
- ☆32Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆72Updated 2 months ago
- Benchmarking Panoptic Video Scene Graph Generation (PVSG), CVPR'23☆102Updated last year
- Official implementation of CVPR 2024 paper "Retrieval-Augmented Open-Vocabulary Object Detection".☆44Updated last year
- [ECCV 2024] Empowering 3D Visual Grounding with Reasoning Capabilities☆80Updated last year
- Learning 1D Causal Visual Representation with De-focus Attention Networks☆35Updated last year
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆23Updated last year
- [ECCV 2024] ControlCap: Controllable Region-level Captioning☆80Updated last year
- Disentangled Pre-training for Human-Object Interaction Detection☆27Updated 4 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆72Updated 2 years ago
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆82Updated 7 months ago