deepglint / unicom
MLCD & UNICOM : Large-Scale Visual Representation Model
☆382Updated this week
Related projects ⓘ
Alternatives and complementary repositories for unicom
- [ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization☆566Updated 5 months ago
- [ECCV 2024] The official code of paper "Open-Vocabulary SAM".☆950Updated 3 months ago
- Official implementation of "Towards Efficient Visual Adaption via Structural Re-parameterization".☆199Updated 7 months ago
- Code for AAAl 2024 paper: Relax Image-Specific Prompt Requirement in SAM: A Single Generic Prompt for Segmenting Camouflaged Objects☆139Updated last month
- (AAAI 2024) BLIVA: A Simple Multimodal LLM for Better Handling of Text-rich Visual Questions☆270Updated 7 months ago
- 【AAAI'2023 & IJCV】Transferring Vision-Language Models for Visual Recognition: A Classifier Perspective☆205Updated 5 months ago
- Open source implementation of "A Self-Supervised Descriptor for Image Copy Detection" (SSCD).☆265Updated 2 years ago
- Real-time and accurate open-vocabulary end-to-end object detection☆1,534Updated 2 months ago
- SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree☆289Updated 2 weeks ago
- [CVPR'23] Universal Instance Perception as Object Discovery and Retrieval☆1,503Updated last year
- Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)☆268Updated 9 months ago
- [CVPR 2023] Prompt, Generate, then Cache: Cascade of Foundation Models makes Strong Few-shot Learners☆349Updated last year
- The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM".☆215Updated last month
- An efficient modular implementation of Associating Objects with Transformers for Video Object Segmentation in PyTorch☆608Updated 8 months ago
- CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet☆208Updated last year
- EntitySeg Toolbox: Towards Open-World and High-Quality Image Segmentation☆700Updated 11 months ago
- Cross-modal few-shot adaptation with CLIP☆316Updated 8 months ago
- 【CVPR'2023 Highlight & TPAMI】Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?☆240Updated 2 months ago
- (TPAMI 2024) A Survey on Open Vocabulary Learning☆845Updated 3 weeks ago
- An open-source implementation for training LLaVA-NeXT.☆395Updated last month
- u-LLaVA: Unifying Multi-Modal Tasks via Large Language Model☆138Updated 4 months ago
- GPT4Vis: What Can GPT-4 Do for Zero-shot Visual Recognition?☆207Updated 6 months ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆489Updated 6 months ago
- ☆473Updated 2 years ago
- MLLM for On-Demand Spatial-Temporal Understanding at Arbitrary Resolution☆291Updated last week
- [ICCV 2023] A large-scale high-resolution dataset satisfies all important data features about document shadow, covers a large number of d…☆202Updated 5 months ago
- ICCV 2023 Paper Global Features are All You Need for Image Retrieval and Reranking Official Repository☆207Updated last year
- Code release for "UniVS: Unified and Universal Video Segmentation with Prompts as Queries" (CVPR2024)☆170Updated 4 months ago
- [ICLR 2023] PyTorch implementation of VLDet (https://arxiv.org/abs/2211.14843)☆184Updated 8 months ago
- Official PyTorch implementation of "EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM"☆942Updated 3 months ago