OpenGVLab / InternVL-MMDetSeg
Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed
☆54Updated 2 weeks ago
Related projects ⓘ
Alternatives and complementary repositories for InternVL-MMDetSeg
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆76Updated 4 months ago
- ☆78Updated 9 months ago
- This repo contains extensions to DINO V2 model by Meta, and awesome applications built on top of it.☆38Updated last year
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆90Updated 3 months ago
- 【ECCV2024】The official repo of Griffon series☆102Updated this week
- 1st solution for the Webly-supervised Fine-grained Recognition competition in https://www.cvmart.net/race/10412/base☆33Updated last year
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆33Updated 2 weeks ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆66Updated 3 months ago
- DynRefer: Delving into Region-level Multi-modality Tasks via Dynamic Resolution☆39Updated 3 months ago
- ☆148Updated last month
- Distilling the powerful segment anything models into lightweight ones for efficient segmentation.☆29Updated last year
- Zero-label image classification via OpenCLIP knowledge distillation☆112Updated last year
- InstaGen: Enhancing Object Detection by Training on Synthetic Dataset, CVPR2024☆73Updated 7 months ago
- ☆99Updated 4 months ago
- [CVPR 2024] Official implementation of "ViTamin: Designing Scalable Vision Models in the Vision-language Era"☆173Updated 5 months ago
- OpenMMLab Detection Toolbox and Benchmark for V3Det☆15Updated 7 months ago
- [CVPR2022] "Progressive End-to-End Object Detection in Crowded Scenes" on Deformable-DETR.☆28Updated 2 years ago
- A Dead Simple and Modularized Multi-Modal Training and Finetune Framework. Compatible to any LLaVA/Flamingo/QwenVL/MiniGemini etc series …☆17Updated 6 months ago
- Training LLaMA language model with MMEngine! It supports LoRA fine-tuning!☆40Updated last year
- [ICCV'23] Cascade-DETR: Delving into High-Quality Universal Object Detection☆95Updated last year
- PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding. PixelLM is accepted by CVPR 2024.☆178Updated 5 months ago
- [EMNLP 2024] RWKV-CLIP: A Robust Vision-Language Representation Learner☆110Updated this week
- ☆38Updated 2 years ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆26Updated last month
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆102Updated 7 months ago
- PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022☆160Updated 2 years ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆115Updated last month
- Official Implementation of ICCV 2023 Paper - SegPrompt: Boosting Open-World Segmentation via Category-level Prompt Learning☆110Updated 2 months ago
- ☆23Updated last week
- [ECCV2024] This is an official implementation for "PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model"☆188Updated 2 months ago