weiji-Feng / Image2Poem
A project that can generate ancient poems based on pictures, including CLIP, T5, GPT2 models
☆17Updated last year
Related projects: ⓘ
- 基于ClipCap的看图说话Image Caption模型☆271Updated 2 years ago
- ChartMimic: Evaluating LMM’s Cross-Modal Reasoning Capability via Chart-to-Code Generation☆80Updated 2 months ago
- [ACL 2024 Best Paper] Deciphering Oracle Bone Language with Diffusion Models☆65Updated 3 weeks ago
- Official implementation of "Open-Vocabulary Multi-Label Classification via Multi-Modal Knowledge Transfer".☆120Updated last year
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆172Updated 10 months ago
- A paper list about diffusion models for natural language processing.☆170Updated last year
- Update 2020☆68Updated 2 years ago
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆62Updated 4 months ago
- 多模态 MM +Chat 合集☆187Updated 2 weeks ago
- Efficient Multimodal Large Language Models: A Survey☆230Updated last month
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆76Updated 2 weeks ago
- Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.☆164Updated last year
- Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.☆12Updated 2 weeks ago
- Official Code for the ICCV23 Paper: "LexLIP: Lexicon-Bottlenecked Language-Image Pre-Training for Large-Scale Image-Text Sparse Retrieval…☆41Updated 11 months ago
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆218Updated last week
- [MIR-2023-Survey] A continuously updated paper list for multi-modal pre-trained big models☆278Updated 2 weeks ago
- A collection of awesome text-to-image generation studies.☆326Updated last week
- A curated list of awesome Multimodal studies.☆67Updated last month
- Research Code for Multimodal-Cognition Team in Ant Group☆111Updated 2 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆28Updated 2 months ago
- Search photos on Unsplash based on OpenAI's CLIP model, support search with joint image+text queries and attention visualization.☆206Updated 3 years ago
- ☆155Updated 10 months ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Allo…☆255Updated 3 weeks ago
- A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text remova…☆186Updated last month
- All-In-One VLM: Image + Video + Transfer to Other Languages / Domains (TPAMI 2023)☆131Updated 3 weeks ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆114Updated 2 months ago
- Visual Instruction Tuning for Qwen2 Base Model☆14Updated 2 months ago
- Official repository of MMDU dataset☆61Updated last month
- [CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models☆196Updated last week
- Bridging Vision and Language Model☆279Updated last year