QuentinFitteRey / VLMSAMLinks
Qwen-SAM is a reasoning-based segmentation model that integrates Qwen 2.5 VL 7B with the Segment Anything Model (SAM), enabling fine-grained visual segmentation from complex text prompts using LoRA fine-tuning.
☆24Updated 8 months ago
Alternatives and similar repositories for VLMSAM
Users that are interested in VLMSAM are comparing it to the libraries listed below
Sorting:
- ☆27Updated 8 months ago
- Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion (ICCV 2025)☆78Updated 4 months ago
- [CVPR2025] ProxyTransformation : Preshaping Point Cloud Manifold With Proxy Attention For 3D Visual Grounding☆48Updated 5 months ago
- ☆34Updated 8 months ago
- Official Implementation of ECCV2024 paper: SLAck☆29Updated last year
- The official implementation of "PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning" (CVPR 2025)☆28Updated 3 months ago
- ☆12Updated 9 months ago
- [ECCV24] Navigation Instruction Generation with BEV Perception and Large Language Models☆30Updated last year
- [NeurIPS 2025] LabelAny3D: Label Any Object 3D in the Wild☆117Updated last month
- Project Page for GaussianFormer☆24Updated last year
- Official implementation of paper "Controllable 3D Outdoor Scene Generation via Scene Graphs" (ICCV 2025)☆62Updated 6 months ago
- [ECCV 2024] 4D Contrastive Superflows are Dense 3D Representation Learners☆51Updated 2 months ago
- [NeurIPS2024] Multiview Scene Graph (topologically representing a scene from unposed images by interconnected place and object nodes)☆123Updated 4 months ago
- Python Toolkit for 360VOT: A New Benchmark Dataset for Omnidirectional Visual Object Tracking, ICCV2023☆44Updated 3 weeks ago
- [ICCV 2025] Official implementation of "AD-GS: Object-Aware B-Spline Gaussian Splatting for Self-Supervised Autonomous Driving"☆34Updated 6 months ago
- [ICLR'25] City-scale 3D Visual Grounding with Multi-modality LLMs☆64Updated 3 months ago
- [NeurIPS 2024] DiffSF: Diffusion Models for Scene Flow Estimation☆29Updated last year
- [AAAI 2024] VSFormer: Visual-Spatial Fusion Transformer for Correspondence Pruning☆14Updated last year
- Official implementation of NavMorph: A Self-Evolving World Model for Vision-and-Language Navigation in Continuous Environments (ICCV'25).☆66Updated last month
- CVPR 2025: VoxelSplat: Dynamic Gaussian Splatting as an Effective Loss for Occupancy and Flow Prediction☆73Updated 6 months ago
- ☆48Updated 2 years ago
- [ICRA 2024]This is the official repo of paper "HR-APR: APR-agnostic Framework with Uncertainty Estimation and Hierarchical Refinement for…☆11Updated last year
- [NeurIPS 2024] XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation☆36Updated last year
- [ACM MM24 Poster] Official implementation of paper "MVPbev: Multi-view Perspective Image Generation from BEV with Test-time Controllabili…☆20Updated 5 months ago
- Code Release for ECCV 2024, "PCF-Lift: Panoptic Lifting by Probabilistic Contrastive Fusion"☆21Updated 10 months ago
- [ICCV 2025] IGL-Nav: Incremental 3D Gaussian Localization for Image-goal Navigation☆63Updated 6 months ago
- [ECCV 2024] Official Implementation of "Appearance-Based Refinement for Object-Centric Motion Segmentation" Junyu Xie, Weidi Xie, Andrew …☆14Updated last year
- Open-Vocabulary Panoptic Segmentation☆27Updated 7 months ago
- ☆53Updated last year
- ☆43Updated 2 years ago