sanbuphy / llm-vision-datasets
Collection of image and video datasets for generative AI and multimodal visual AI
☆17Updated 4 months ago
Related projects: ⓘ
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆72Updated 3 months ago
- Research Code for Multimodal-Cognition Team in Ant Group☆111Updated 2 months ago
- 多模态 MM +Chat 合集☆187Updated 2 weeks ago
- AAAI 2024: Visual Instruction Generation and Correction☆86Updated 7 months ago
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆114Updated 2 months ago
- PromptDet: Towards Open-vocabulary Detection using Uncurated Images, ECCV2022☆159Updated 2 years ago
- ☆76Updated 7 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆47Updated 5 months ago
- A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing☆282Updated 2 months ago
- ☆90Updated last year
- Official repository of MMDU dataset☆61Updated last month
- ☆106Updated 3 months ago
- A paper list of some recent works about Token Compress for Vit and VLM☆32Updated last week
- [CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from Fine-grained Correctional Human Feedback☆218Updated last week
- ☆100Updated 7 months ago
- Minicpm和MiniCPM-V的项目和教程。包括推理,量化,边端部署,微调,技术报告、应用六个主题☆87Updated this week
- Efficient Multimodal Large Language Models: A Survey☆230Updated last month
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆172Updated 10 months ago
- 1st solution for the Webly-supervised Fine-grained Recognition competition in https://www.cvmart.net/race/10412/base☆33Updated last year
- Recognize Any Regions☆115Updated 9 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆94Updated last year
- OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text☆246Updated 3 weeks ago
- ☆31Updated 2 months ago
- [CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception☆474Updated 4 months ago
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆204Updated 7 months ago
- OvarNet official implement of the paper "OvarNet: Towards Open-vocabulary Object Attribute Recognition"☆98Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆239Updated 2 months ago
- ☆100Updated last month
- Lion: Kindling Vision Intelligence within Large Language Models☆52Updated 7 months ago
- Explore the Limits of Omni-modal Pretraining at Scale☆80Updated 2 weeks ago