TencentARC-QQ / TagGPT
TagGPT: Large Language Models are Zero-shot Multimodal Taggers
☆61Updated last year
Related projects ⓘ
Alternatives and complementary repositories for TagGPT
- the world's first large-scale multi-modal short-video encyclopedia, where the primitive units are items, aspects, and short videos.☆60Updated 11 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆70Updated last week
- Chinese CLIP models with SOTA performance.☆48Updated last year
- Product1M☆86Updated 2 years ago
- Narrative movie understanding benchmark☆60Updated 6 months ago
- 基于baichuan-7b的开源多模态大语言模型☆72Updated 11 months ago
- A curated list of resources about long-context in large-language models and video understanding.☆30Updated last year
- ☆61Updated last year
- ☆53Updated 3 months ago
- ☆23Updated 3 months ago
- ☆17Updated last year
- Bling's Object detection tool☆56Updated last year
- Empirical Study Towards Building An Effective Multi-Modal Large Language Model☆23Updated last year
- SkyScript-100M: 1,000,000,000 Pairs of Scripts and Shooting Scripts for Short Drama: https://arxiv.org/abs/2408.09333v2☆102Updated last week
- A Simple MLLM Surpassed QwenVL-Max with OpenSource Data Only in 14B LLM.☆36Updated 2 months ago
- Source code for EMNLP2022 long paper: Parameter-Efficient Tuning Makes a Good Classification Head☆13Updated 2 years ago
- ☆30Updated 6 months ago
- ☆66Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆78Updated 10 months ago
- ☆35Updated 2 months ago
- Video dataset dedicated to portrait-mode video recognition.☆38Updated 7 months ago
- 本项目使用LLaVA 1.6多模态模型实现以文搜图和以图搜图功能。☆17Updated 8 months ago
- 🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)☆63Updated 11 months ago
- Official Repository of MMLONGBENCH-DOC: Benchmarking Long-context Document Understanding with Visualizations☆57Updated 4 months ago
- code for paper 《RankingGPT: Empowering Large Language Models in Text Ranking with Progressive Enhancement》☆29Updated 10 months ago
- Unleashing the Power of Cognitive Dynamics on Large Language Models☆60Updated 2 months ago
- Multimodal chatbot with computer vision capabilities integrated☆99Updated 6 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆123Updated 4 months ago
- A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools☆65Updated last year
- The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"☆73Updated this week