YiyiyiZhao / VIALM
Survey and Benchmark of VIALM
☆9Updated last year
Alternatives and similar repositories for VIALM:
Users that are interested in VIALM are comparing it to the libraries listed below
- The code for On Robust Cross-View Consistency in Outdoor Self-Supervised Monocular Depth Estimation☆13Updated last year
- ☆11Updated 4 months ago
- Code for Open3DTrack: Towards Open-Vocabulary 3D Multi-Object Tracking☆23Updated 3 months ago
- ☆29Updated 2 months ago
- A curated list of research papers in Referring Expression Comprehension (REC)☆43Updated 3 years ago
- This is the official repo for Contrastive Vision-Language Alignment Makes Efficient Instruction Learner.☆20Updated last year
- ☆14Updated last year
- This is the official implementation of "LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels" (Accepted at C…☆23Updated 7 months ago
- Code for "Visual Spatial Description: Controlled Spatial-Oriented Image-to-Text Generation"☆26Updated 10 months ago
- [NeurIPS 2023] OV-PARTS: Towards Open-Vocabulary Part Segmentation☆76Updated 7 months ago
- ☆16Updated last year
- Generative Bias for Robust Visual Question Answering ( CVPR 2023 )☆26Updated last year
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆37Updated 3 weeks ago
- This is the repository for DDS3D(ICRA2023)☆16Updated last year
- VLPrompt: Vision-Language Prompting for Panoptic Scene Graph Generation☆24Updated 4 months ago
- ☆34Updated 10 months ago
- ☆54Updated 5 months ago
- [ECCV 2024 Best Paper Candidate] Implementation of "Expanding Scene Graph Boundaries: Fully Open-vocabulary Scene Graph Generation via Vi…☆48Updated this week
- This repository provides a multi task benchmark for instance segmentation, depth estimation, and 3D object detection.☆14Updated last year
- [CVPR 2023] MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training☆47Updated last year
- ☆25Updated last year
- [ECCV 2022] 🎵PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation☆57Updated 2 years ago
- Emerging Pixel Grounding in Large Multimodal Models Without Grounding Supervision☆29Updated 3 months ago
- [CVPR 2022] X-Trans2Cap: Cross-Modal Knowledge Transfer using Transformer for 3D Dense Captioning☆33Updated 2 years ago
- This repository provides data for the VAW dataset as described in the CVPR 2021 paper titled "Learning to Predict Visual Attributes in th…☆62Updated 2 years ago
- ☆16Updated 11 months ago
- This is the code related to "Context-aware Alignment and Mutual Masking for 3D-Language Pre-training" (CVPR 2023).☆25Updated last year
- A systematic survey of multi-modal and multi-task visual understanding foundation models for driving scenarios☆48Updated 8 months ago
- A Simple Active-and-Adaptive Baseline for Cross-Domain 3D Semantic Segmentation☆13Updated 2 years ago
- [AAAI 2023 Oral] Language-Assisted 3D Feature Learning for Semantic Scene Understanding☆12Updated last year