weiji-Feng / Image2PoemLinks
A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models
☆22Updated 8 months ago
Alternatives and similar repositories for Image2Poem
Users that are interested in Image2Poem are comparing it to the libraries listed below
Sorting:
- 基于ClipCap的看图说话Image Caption模型☆316Updated 3 years ago
 - 该项目旨在通过输入文本描述来检索与之相匹配的图片。☆43Updated 2 years ago
 - 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆249Updated last year
 - Building a VLM model starts from the basic module.☆18Updated last year
 - Research Code for Multimodal-Cognition Team in Ant Group☆169Updated 3 weeks ago
 - 本项目用于Multimodal领域新手的学习路线,包括该领域的经典论文,项目及课程。旨在希望学习者在一定的时间内达到对这个领域有较为深刻的认知,能够自己进行的独立研 究。☆39Updated last year
 - [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆152Updated 2 months ago
 - Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆698Updated last month
 - [MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models☆288Updated 3 months ago
 - 对llava官方代码的一些学习笔记☆29Updated last year
 - 多模态 MM +Chat 合集☆276Updated 2 months ago
 - 大模型进阶面经☆77Updated 5 months ago
 - mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)☆96Updated 2 years ago
 - This is the first released survey paper on hallucinations of large vision-language models (LVLMs). To keep track of this field and contin…☆82Updated last year
 - finetune stable diffusion with Dreambooth、LoRA、ControlNet☆59Updated 2 years ago
 - WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆127Updated 11 months ago
 - 自己阅读的多模态对话系统论文(及部分笔记)汇总☆23Updated 2 years ago
 - Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".☆129Updated 11 months ago
 - List of papers about Large Multimodal model☆31Updated 5 months ago
 - Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆20Updated 8 months ago
 - Visual Instruction Tuning for Qwen2 Base Model☆39Updated last year
 - The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆43Updated last year
 - TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆191Updated last year
 - code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"☆19Updated 7 months ago
 - The official code for NeurIPS 2024 paper: Harmonizing Visual Text Comprehension and Generation☆129Updated 11 months ago
 - This project aims to collect and collate various datasets for multimodal large model training, including but not limited to pre-training …☆58Updated 5 months ago
 - [arXiv 25] Aesthetics is Cheap, Show me the Text: An Empirical Evaluation of State-of-the-Art Generative Models for OCR☆236Updated 2 months ago
 - [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆94Updated 2 months ago
 - [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆171Updated 3 months ago
 - [AAAI 2025 (Oral)] SAIL: Sample-Centric In-Context Learning for Document Information Extraction☆18Updated 10 months ago