CuriseJia / FreeStyleRet
Precision Search through Multi-Style Inputs
☆60Updated 5 months ago
Alternatives and similar repositories for FreeStyleRet:
Users that are interested in FreeStyleRet are comparing it to the libraries listed below
- Video dataset dedicated to portrait-mode video recognition.☆41Updated last month
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆88Updated last month
- LAVIS - A One-stop Library for Language-Vision Intelligence☆47Updated 5 months ago
- ☆76Updated 8 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆187Updated 6 months ago
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆18Updated last month
- [ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback☆41Updated 2 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆28Updated 3 months ago
- Official implementation of TagAlign☆34Updated last month
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆130Updated last month
- ☆52Updated last week
- Official implementation of High Fidelity Scene Text Synthesis.☆45Updated 2 weeks ago
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆106Updated 2 months ago
- ☆132Updated this week
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated 11 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 3 months ago
- The official implementation of RAR☆78Updated 9 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆58Updated 4 months ago
- VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection☆41Updated this week
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆63Updated 2 months ago
- ☆64Updated last month
- Unified Multi-modal IAA Baseline and Benchmark☆72Updated 3 months ago
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆46Updated last year
- [NeurIPS 2024 D&B Track] Official Repo for "LVD-2M: A Long-take Video Dataset with Temporally Dense Captions"☆45Updated 3 months ago
- ☆47Updated last month
- [CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation☆67Updated 6 months ago
- ☆61Updated 2 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆63Updated 4 months ago
- 【NeurIPS 2024】Dense Connector for MLLMs☆154Updated 3 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆68Updated 3 months ago