Aman-4-Real / MMTGLinks

[ACM MM 2022]: Multi-Modal Experience Inspired AI Creation

☆20

Alternatives and similar repositories for MMTG

Users that are interested in MMTG are comparing it to the libraries listed below

Sorting:

SihengLi99 / TextBind
[2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation
☆47Updated last year
Aman-4-Real / awesome-multimodal-dialogue
Paper, dataset and code list for multimodal dialogue.
☆21Updated 6 months ago
victorsungo / MMDialog
The official site of paper MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
☆198Updated last year
MikeWangWZHL / VidIL
Pytorch code for Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
☆115Updated 2 years ago
YunxinLi / LingCloud
Attaching human-like eyes to the large language model. The codes of IEEE TMM paper "LMEye: An Interactive Perception Network for Large La…
☆48Updated last year
PLUM-Lab / MultiInstruct
MultiInstruct: Improving Multi-Modal Zero-Shot Learning via Instruction Tuning
☆135Updated 2 years ago
Aman-4-Real / See-or-Guess
[ACM MM 2024] See or Guess: Counterfactually Regularized Image Captioning
☆14Updated 5 months ago
yuezih / Movie101
Narrative movie understanding benchmark
☆73Updated last month
MichaelZhouwang / VLUE
This repo contains codes and instructions for baselines in the VLUE benchmark.
☆41Updated 3 years ago
patrick-tssn / Awesome-Colorful-LLM
Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,…
☆123Updated last month
ShannonAI / OpenViDial
Code, Models and Datasets for OpenViDial Dataset
☆131Updated 3 years ago
RUC-AIMind / TikTalk
☆70Updated last month
irfl-dataset / IRFL
IRFL: Image Recognition of Figurative Language
☆11Updated last year
YujieLu10 / TIP
Multimodal-Procedural-Planning
☆92Updated 2 years ago
VRU-NExT / VideoQA
☆92Updated 2 years ago
AIM3-RUC / VideoIC
Danmuku dataset
☆11Updated 2 years ago
qijimrc / mm_evaluation
☆11Updated 11 months ago
limanling / KnowledgeVL-Reading
☆68Updated 2 years ago
edchengg / infoseek_eval
EMNLP2023 - InfoSeek: A New VQA Benchmark focus on Visual Info-Seeking Questions
☆24Updated last year
Yuco-Z / Awesome-Multi-Modal-Dialog
[Paperlist] Awesome paper list of multimodal dialog, including methods, datasets and metrics
☆39Updated 5 months ago
open-vision-language / infoseek
☆54Updated last year
ictnlp / PLUVR
Code for ACL 2022 main conference paper "Neural Machine Translation with Phrase-Level Universal Visual Representations".
☆21Updated last year
TIGER-AI-Lab / UniIR
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆155Updated 9 months ago
joez17 / ChatBridge
ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…
☆52Updated last year
doc-doc / NExT-QA
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)
☆161Updated 11 months ago
open-vision-language / oven
☆39Updated last year
albertwy / GPT-4V-Evaluation
Data for evaluating GPT-4V
☆11Updated last year
microsoft / PICa
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA, AAAI 2022 (Oral)
☆85Updated 3 years ago
Nicous20 / FunQA
FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …
☆102Updated 7 months ago
zmykevin / UC2
CVPR 2021 Official Pytorch Code for UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training
☆34Updated 3 years ago