BIGBALLON / UME-SearchLinks
Toward Universal Multimodal Embedding
☆55Updated last month
Alternatives and similar repositories for UME-Search
Users that are interested in UME-Search are comparing it to the libraries listed below
Sorting:
- Research Code for Multimodal-Cognition Team in Ant Group☆164Updated last month
- Chinese CLIP models with SOTA performance.☆57Updated 2 years ago
- Building a VLM model starts from the basic module.☆17Updated last year
- ☆70Updated 2 years ago
- ☆30Updated last year
- Multimodal chatbot with computer vision capabilities integrated, our 1st-gen LMM☆100Updated last year
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆191Updated last year
- ☆182Updated last year
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆19Updated 6 months ago
- ☆21Updated 2 months ago
- [ACL 2025 Oral] 🔥🔥 MegaPairs: Massive Data Synthesis for Universal Multimodal Retrieval☆219Updated 3 months ago
- ☆166Updated last year
- ☆87Updated last year
- ☆57Updated last year
- This repository contains the dataset, codebase, and benchmarks for our paper: <CNVid-3.5M: Build, Filter, and Pre-train the Large-scale P…☆25Updated last year
- Lion: Kindling Vision Intelligence within Large Language Models☆51Updated last year
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆33Updated last month
- Large Multimodal Model☆15Updated last year
- [CVPR 2023 Workshop] The code reproduce the results of our solutions on both tracks for Meta AI Video Similarity Challenge (CVPR 2023 Wor…☆54Updated 2 years ago
- Product1M☆87Updated 2 years ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆99Updated 10 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆115Updated 6 months ago
- [IJCV 2024] TransDETR: End-to-end Video Text Spotting with Transformer☆104Updated last year
- A new video text spotting framework with Transformer☆77Updated 3 years ago
- The huggingface implementation of Fine-grained Late-interaction Multi-modal Retriever.☆96Updated 3 months ago
- New generation of CLIP with fine grained discrimination capability, ICML2025☆283Updated last month
- ☆15Updated 3 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆150Updated last year
- An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…☆30Updated 2 months ago
- ☆174Updated 6 months ago