weiji-Feng / Image2Poem
A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models
☆21Updated last year
Alternatives and similar repositories for Image2Poem:
Users that are interested in Image2Poem are comparing it to the libraries listed below
- 基于ClipCap的看图说话Image Caption模型☆294Updated 2 years ago
- transformers结构的中文OFA模型☆123Updated last year
- Research Code for Multimodal-Cognition Team in Ant Group☆133Updated 6 months ago
- 该项目旨在通过输入文本描述来检索与之相匹配的图片。☆31Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆128Updated 6 months ago
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆177Updated last year
- 多模态 MM +Chat 合集☆238Updated 3 weeks ago
- ☆158Updated last year
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆113Updated 2 months ago
- Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.☆166Updated 2 years ago
- Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".☆124Updated 2 months ago
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆42Updated last year
- ☆43Updated last year
- Implementation of our CVPR2022 paper, Negative-Aware Attention Framework for Image-Text Matching.☆111Updated last year
- Transferable Decoding with Visual Entities for Zero-Shot Image Captioning, ICCV 2023☆151Updated 4 months ago
- Towards Efficient and Effective Text-to-Video Retrieval with Coarse-to-Fine Visual Representation Learning☆12Updated 10 months ago
- [NeurIPS 2022 Spotlight] Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations☆127Updated 9 months ago
- Building a VLM model starts from the basic module.☆11Updated 9 months ago
- [MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models☆284Updated 2 weeks ago
- mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections. (EMNLP 2022)☆87Updated last year
- Visual Instruction Tuning for Qwen2 Base Model☆22Updated 7 months ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆118Updated 8 months ago
- ☆57Updated 2 years ago
- Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks☆290Updated last year
- Update 2020☆75Updated 2 years ago
- ☆38Updated 7 months ago
- ☆43Updated 2 years ago
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆238Updated 3 weeks ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated last year
- The official implementation of RAR☆79Updated 10 months ago