CuriseJia / FreeStyleRetLinks

Precision Search through Multi-Style Inputs

☆71

Alternatives and similar repositories for FreeStyleRet

Users that are interested in FreeStyleRet are comparing it to the libraries listed below

Sorting:

rotem-shalev / ImageRAG
☆84Updated 4 months ago
ShareGPT4Omni / ShareGPT4V
[ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captions
☆224Updated last year
Fr0zenCrane / Cockatiel
The official implementation of our paper "Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption"
☆34Updated last month
sterzhang / image-textualization
Image Textualization: An Automatic Framework for Generating Rich and Detailed Image Descriptions (NeurIPS 2024)
☆164Updated 11 months ago
hlchen23 / ADPN-MM
Repository for 23'MM accepted paper "Curriculum-Listener: Consistency- and Complementarity-Aware Audio-Enhanced Temporal Sentence Groundi…
☆49Updated last year
CodeGoat24 / DreamText
[CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.
☆67Updated 3 months ago
Mowenyii / PAE
[CVPR 2024] Dynamic Prompt Optimizing for Text-to-Image Generation
☆72Updated last year
alipay / Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
☆154Updated last week
ggg0919 / cantor
☆85Updated last year
Token-family / TokenFD
[ICCV2025] A Token-level Text Image Foundation Model for Document Understanding
☆105Updated 2 weeks ago
cnzzx / VSA
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆125Updated 8 months ago
scenarios / WeMM
☆87Updated last year
ZhangXJ199 / TinyLLaVA-Video
A Simple Framework of Small-scale LMMs for Video Understanding
☆71Updated last month
thunlp / Migician
[ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models
☆68Updated last month
xmu-xiaoma666 / Multimodal-Open-O1
Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…
☆29Updated 9 months ago
showlab / VisorGPT
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
☆136Updated last year
invictus717 / MiCo
[ICCV'25] Explore the Limits of Omni-modal Pretraining at Scale
☆105Updated 10 months ago
mengcye / LAION-SG
☆53Updated 2 months ago
deepglint / UniME
[ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆80Updated last week
Chenguoz / CAIG
[WWW 2025] Official PyTorch Code for "CTR-Driven Advertising Image Generation with Multimodal Large Language Models"
☆43Updated 4 months ago
mutonix / Vript
☆154Updated 6 months ago
friedrichor / UNITE
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
☆24Updated last week
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆52Updated last year
artemisp / LAVIS-XInstructBLIP
LAVIS - A One-stop Library for Language-Vision Intelligence
☆48Updated 11 months ago
hyc2026 / StoryTeller
☆76Updated 4 months ago
bytedance / Portrait-Mode-Video
Video dataset dedicated to portrait-mode video recognition.
☆52Updated 7 months ago
360CVGroup / Inner-Adaptor-Architecture
LMM solved catastrophic forgetting, AAAI2025
☆44Updated 3 months ago
callsys / TextVR
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆26Updated last year
yuyq96 / TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
☆62Updated 8 months ago
HuiZhang0812 / CreatiLayout
☆101Updated 2 months ago