Vision-CAIR / affectiveVisDial
☆12Updated 7 months ago
Alternatives and similar repositories for affectiveVisDial:
Users that are interested in affectiveVisDial are comparing it to the libraries listed below
- The official implement of paper S2-VER: Semi-Supervised Visual Emotion Recognition☆11Updated 9 months ago
- Explainable Multimodal Emotion Reasoning (EMER) and AffectGPT☆131Updated 9 months ago
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆48Updated 5 months ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆53Updated 7 months ago
- [ACL2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆12Updated 3 months ago
- Official repository for the A-OKVQA dataset☆75Updated 9 months ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆98Updated 3 weeks ago
- Repo for the EMNLP 2023 paper "A Simple Knowledge-Based Visual Question Answering"☆22Updated last year
- Code and dataset of "MEmoR: A Dataset for Multimodal Emotion Reasoning in Videos" in MM'20.☆53Updated last year
- The official implementation of ECCV2024 paper "Facial Affective Behavior Analysis with Instruction Tuning"☆23Updated last month
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆186Updated last year
- Training A Small Emotional Vision Language Model for Visual Art Comprehension☆15Updated 6 months ago
- Reproduce of 'Weakly Supervised Coupled Networks for Visual Sentiment Analysis'☆14Updated 5 years ago
- GPT-4V with Emotion☆89Updated last year
- The Social-IQ 2.0 Challenge Release for the Artificial Social Intelligence Workshop at ICCV '23☆22Updated last year
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆63Updated 7 months ago
- [ACM MM 2022]: Multi-Modal Experience Inspired AI Creation☆20Updated 2 months ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆121Updated last year
- Official implementation of HawkEye: Training Video-Text LLMs for Grounding Text in Videos☆37Updated 9 months ago
- [CVPR'23 Highlight] AutoAD: Movie Description in Context.☆92Updated 3 months ago
- ☆28Updated 3 months ago
- ☆30Updated 10 months ago
- NewsCLIPpings: Automatic Generation of Out-of-Context Multimodal Media, EMNLP 2021☆38Updated 5 months ago
- NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR'21)☆142Updated 6 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆80Updated 2 months ago
- [CVPR 2024] MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos☆31Updated 3 weeks ago
- ☆35Updated 2 years ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆156Updated 2 months ago
- ☆15Updated 3 months ago
- ☆53Updated 6 months ago