hhaAndroid / awesome-mm-chat
多模态 MM +Chat 合集
☆247Updated last month
Alternatives and similar repositories for awesome-mm-chat:
Users that are interested in awesome-mm-chat are comparing it to the libraries listed below
- 主要记录大语言大模型(LLMs) 算法(应用)工程师多模态相关知识☆153Updated 10 months ago
- DeepSpeed教程 & 示例注释 & 学习笔记 (大模型高效训练)☆151Updated last year
- The code of the paper "NExT-Chat: An LMM for Chat, Detection and Segmentation".☆237Updated last year
- Personal Project: MPP-Qwen14B & MPP-Qwen-Next(Multimodal Pipeline Parallel based on Qwen-LM). Support [video/image/multi-image] {sft/conv…☆424Updated last week
- Reading notes about Multimodal Large Language Models, Large Language Models, and Diffusion Models☆306Updated last month
- Research Code for Multimodal-Cognition Team in Ant Group☆138Updated 8 months ago
- Efficient Multimodal Large Language Models: A Survey☆326Updated 2 weeks ago
- WWW2025 Multimodal Intent Recognition for Dialogue Systems Challenge☆116Updated 4 months ago
- [TPAMI reviewing] Towards Visual Grounding: A Survey☆113Updated last month
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 2 months ago
- 这是一个DiT-pytorch的代码,主要用于学习DiT结构。☆74Updated last year
- [CVPR 2024] LION: Empowering Multimodal Large Language Model with Dual-Level Visual Knowledge☆132Updated 8 months ago
- Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).☆153Updated 5 months ago
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用☆92Updated last year
- [TMM 2023] Self-paced Curriculum Adapting of CLIP for Visual Grounding.☆116Updated last month
- A unified evaluation library for multiple machine learning libraries☆261Updated 11 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆202Updated 2 months ago
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents☆309Updated 11 months ago
- The repository provides code for running inference with the SegmentAnything Model (SAM), links for downloading the trained model checkpoi…☆94Updated last year
- ☆289Updated last month
- Official implementation of OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion☆297Updated last week
- [CVPR 2024] PixelLM is an effective and efficient LMM for pixel-level reasoning and understanding.☆213Updated last month
- Collection of image and video datasets for generative AI and multimodal visual AI☆22Updated 10 months ago
- Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)☆291Updated last year
- TaiSu(太素)--a large-scale Chinese multimodal dataset(亿级大规模中文视觉语言预训练数据集)☆179Updated last year
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection☆162Updated 11 months ago
- ☆136Updated last year
- The official implementation of RAR☆82Updated 11 months ago