360CVGroup / FG-CLIP
☆18Updated last week
Alternatives and similar repositories for FG-CLIP:
Users that are interested in FG-CLIP are comparing it to the libraries listed below
- Our 2nd-gen LMM☆33Updated 11 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆89Updated 6 months ago
- Chinese CLIP models with SOTA performance.☆55Updated last year
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆143Updated 9 months ago
- Large Multimodal Model☆15Updated last year
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the p…☆14Updated last year
- ☆87Updated 10 months ago
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆26Updated last year
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆56Updated 6 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 6 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- MLLM-DataEngine: An Iterative Refinement Approach for MLLM☆46Updated 11 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆91Updated this week
- The official repository for the RealSyn dataset☆28Updated last week
- ☆67Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆16Updated 2 months ago
- ☆56Updated last year
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 3 months ago
- Precision Search through Multi-Style Inputs☆69Updated 2 weeks ago
- A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).☆19Updated 5 months ago
- Official implementation of TagAlign☆34Updated 4 months ago
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- ☆19Updated last year
- official code for paper: Exploring Domain Incremental Video Highlights Detection with the LiveFood Benchmark☆35Updated last year
- ☆43Updated 4 months ago
- [ICCV2023] TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance☆91Updated 9 months ago
- Video dataset dedicated to portrait-mode video recognition.☆48Updated 4 months ago