NMS05 / DinoV2-SigLIP-Phi3-LoRA-VLMLinks
☆31Updated last year
Alternatives and similar repositories for DinoV2-SigLIP-Phi3-LoRA-VLM
Users that are interested in DinoV2-SigLIP-Phi3-LoRA-VLM are comparing it to the libraries listed below
Sorting:
- Open-Vocabulary Panoptic Segmentation☆24Updated 8 months ago
- ☆36Updated last month
- ROOT: VLM based System for Indoor Scene Understanding and Beyond☆28Updated 4 months ago
- Codes for ICML 2023 Learning Dynamic Query Combinations for Transformer-based Object Detection and Segmentation☆37Updated last year
- (ICLR 2024, CVPR 2024) SparseFormer☆74Updated 6 months ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆56Updated last year
- [arXiv 2025] Can MLLMs Guide Me Home? A Benchmark Study on Fine-Grained Visual Reasoning from Transit Maps☆26Updated 3 weeks ago
- ☆28Updated 4 months ago
- Source code for "To Adapt or Not to Adapt? Real-Time Adaptation for Semantic Segmentation", ICCV 2023☆48Updated 11 months ago
- Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces☆35Updated this week
- [ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception☆10Updated last month
- ☆9Updated 2 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆95Updated 10 months ago
- ☆42Updated 3 weeks ago
- LEO: A powerful Hybrid Multimodal LLM☆18Updated 4 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆26Updated 2 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆43Updated 4 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆74Updated 3 months ago
- Harnessing CLIP, DINO and SAM for Open Vocabulary Segmentation☆58Updated 3 months ago
- ☆81Updated 2 months ago
- ☆18Updated 2 years ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆51Updated 5 months ago
- Official implementation of the WACV 2025 paper "3D Part Segmentation via Geometric Aggregation of 2D Visual Features"☆18Updated 2 months ago
- [ECCV-24] This is the official implementation of the paper "SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation".☆22Updated 7 months ago
- [AAAI 2024] Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-Supervised 3D Object Detection☆11Updated 4 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated last year
- Unifying 2D and 3D Vision-Language Understanding☆82Updated last month
- [CVPR 2025 Highlight] Towards Autonomous Micromobility through Scalable Urban Simulation☆29Updated last month
- ☆58Updated last year
- ☆20Updated 2 months ago