zjr2000 / Awesome-Multimodal-ChatbotLinks

Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction, such as text, speech, images, and videos, to provide a seamless and versatile user experience.

☆77

Alternatives and similar repositories for Awesome-Multimodal-Chatbot

Users that are interested in Awesome-Multimodal-Chatbot are comparing it to the libraries listed below

Sorting:

showlab / assistgpt
☆66Updated 2 years ago
cg1177 / VideoLLM
VideoLLM: Modeling Video Sequence with Large Language Models
☆157Updated last year
mbzuai-oryx / Video-LLaVA
PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models
☆256Updated last year
icoz69 / StableLLAVA
Official repo for StableLLAVA
☆95Updated last year
isekai-portal / Link-Context-Learning
☆97Updated last year
RenShuhuai-Andy / TESTA
[EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
☆50Updated last year
Nicous20 / FunQA
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …
☆102Updated 6 months ago
TIGER-AI-Lab / Mantis
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR2024]
☆218Updated 3 months ago
YujieLu10 / LLMScore
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
☆132Updated last year
SHI-Labs / OLA-VLM
OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024
☆60Updated 4 months ago
FudanNLPLAB / MouSi
☆73Updated last year
Ahnsun / merlin
[ECCV2024] Official code implementation of Merlin: Empowering Multimodal LLMs with Foresight Minds
☆94Updated 11 months ago
swordlidev / Evaluation-Multimodal-LLMs-Survey
A Survey on Benchmarks of Multimodal Large Language Models
☆111Updated 3 months ago
OpenGVLab / Awesome-LLM4Tool
A curated list of the papers, repositories, tutorials, and anythings related to the large language models for tools
☆67Updated last year
KastanDay / video-pretrained-transformer
Multi-model video-to-text by combining embeddings from Flan-T5 + CLIP + Whisper + SceneGraph. The 'backbone LLM' is pre-trained from scra…
☆52Updated 2 years ago
sanjayss34 / codevqa
☆85Updated last year
OpenGVLab / TPO
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆51Updated 5 months ago
showlab / VideoGUI
[NeurIPS 2024 D&B] VideoGUI: A Benchmark for GUI Automation from Instructional Videos
☆39Updated 2 weeks ago
TIGER-AI-Lab / MEGA-Bench
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]
☆68Updated 2 months ago
patrick-tssn / Awesome-Colorful-LLM
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…
☆122Updated last month
llyx97 / TempCompass
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆117Updated 2 months ago
NVlabs / LITA
☆179Updated 8 months ago
cliangyu / Cola
[NeurIPS2023] Official implementation of the paper "Large Language Models are Visual Reasoning Coordinators"
☆105Updated last year
yuweihao / MM-Vet
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)
☆303Updated 5 months ago
PhoenixZ810 / MG-LLaVA
Official repository for paper MG-LLaVA: Towards Multi-Granularity Visual Instruction Tuning(https://arxiv.org/abs/2406.17770).
☆156Updated 9 months ago
imagegridworth / IG-VLM
☆135Updated 8 months ago
zzxslp / SoM-LLaVA
[COLM-2024] List Items One by One: A New Data Source and Learning Paradigm for Multimodal LLMs
☆143Updated 10 months ago
kaleido-lab / dolphin
General video interaction platform based on LLMs, including Video ChatGPT
☆251Updated last year
kyegomez / PALI
Democratization of "PaLI: A Jointly-Scaled Multilingual Language-Image Model"
☆91Updated last year
Ziyang412 / VideoTree
Code for CVPR25 paper "VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos"
☆120Updated this week