CuriseJia / FreeStyleRetLinks
Precision Search through Multi-Style Inputs
☆70Updated last month
Alternatives and similar repositories for FreeStyleRet
Users that are interested in FreeStyleRet are comparing it to the libraries listed below
Sorting:
- New generation of CLIP with fine grained discrimination capability, ICML2025☆158Updated 2 weeks ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions☆221Updated 11 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆63Updated 2 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 8 months ago
- [WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"☆33Updated 2 months ago
- [ICLR2025] Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆74Updated 4 months ago
- [ECCV2024] Towards Reliable Advertising Image Generation Using Human Feedback☆49Updated 6 months ago
- [NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT☆136Updated last year
- ☆52Updated last month
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆126Updated 6 months ago
- ☆61Updated last month
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 9 months ago
- ☆85Updated last year
- ☆75Updated 3 months ago
- The official implementation of RAR☆88Updated last year
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated last year
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆65Updated 8 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆147Updated 2 weeks ago
- Video dataset dedicated to portrait-mode video recognition.☆52Updated 5 months ago
- A lightweight flexible Video-MLLM developed by TencentQQ Multimedia Research Team.☆71Updated 7 months ago
- ☆94Updated last month
- Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…☆49Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆165Updated 7 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 7 months ago
- ☆76Updated 2 months ago
- ☆148Updated 4 months ago
- ☆87Updated 11 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆141Updated 10 months ago
- [CVPR 2025] InstanceCap: Improving Text-to-Video Generation via Instance-aware Structured Caption 🔍☆42Updated last month
- Codebase for the Recognize Anything Model (RAM)☆79Updated last year