OpenGVLab / InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
☆58Updated 3 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for InternVL-MMDetSeg
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆40Updated last month
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆179Updated 5 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated last week
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 4 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆77Updated 5 months ago
- ☆78Updated 9 months ago
- ☆101Updated 5 months ago
- ☆105Updated 3 months ago
- ☆46Updated 2 weeks ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆135Updated 2 weeks ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆66Updated 4 months ago
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆73Updated 7 months ago
- ☆130Updated 10 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- 【ECCV2024】The official repo of Griffon series☆106Updated 2 weeks ago
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 3 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆53Updated 3 weeks ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆140Updated 2 weeks ago
- ☆19Updated 11 months ago
- ☆17Updated last year
- ☆38Updated 2 years ago
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆193Updated this week
- PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.☆182Updated 5 months ago
- Recognize Any Regions☆118Updated last month
- ☆57Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆120Updated last month
- PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022☆160Updated 2 years ago
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆36Updated 6 months ago