weiji-Feng / Image2PoemLinks
A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models
☆22Updated 3 months ago
Alternatives and similar repositories for Image2Poem
Users that are interested in Image2Poem are comparing it to the libraries listed below
Sorting:
- Update 2020☆75Updated 3 years ago
- 自己阅读的多模态对话系统论文(及部分笔记)汇总☆22Updated 2 years ago
- mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)☆92Updated 2 years ago
- 该项目旨在通过输入文本描述来检索与之相匹配的图片。☆40Updated last year
- SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation☆111Updated last year
- Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".☆128Updated 7 months ago
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆134Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆143Updated 10 months ago
- [Paper][AAAI2024]Structure-CLIP: Towards Scene Graph Knowledge to Enhance Multi-modal Structured Representations☆140Updated 11 months ago
- 对llava官方代码的 一些学习笔记☆25Updated 7 months ago
- Visual Instruction Tuning for Qwen2 Base Model☆34Updated 11 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆16Updated 3 months ago
- ☆48Updated last year
- A comprehensive survey of Composed Multi-modal Retrieval (CMR), including Composed Image Retrieval (CIR) and Composed Video Retrieval (CV…☆40Updated last week
- (TIP'2023) Concept-Aware Video Captioning: Describing Videos with Effective Prior Information☆29Updated 5 months ago
- A curasted list of papers with the topic of Diffusion Models for Multi-Modal☆28Updated last year
- [EMNLP 2024 Findings] The official PyTorch implementation of EchoSight: Advancing Visual-Language Models with Wiki Knowledge.☆61Updated 2 months ago
- [ICLR 2025] TRACE: Temporal Grounding Video LLM via Casual Event Modeling☆97Updated 4 months ago
- ☆49Updated 11 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆147Updated 2 weeks ago
- ☆76Updated 7 months ago
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆41Updated last year
- Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.☆116Updated last year
- 基于ClipCap的看图说话Image Caption模型☆302Updated 3 years ago
- Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)☆65Updated last year
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆162Updated 9 months ago
- The official implementation of RAR☆88Updated last year
- The datasets for image emotion computing☆34Updated 3 years ago
- ☆59Updated 2 months ago
- 🔥CVPR 2025 Multimodal Large Language Models Paper List☆143Updated 2 months ago