Vision-CAIR / affectiveVisDial
☆11Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for affectiveVisDial
- The official implement of paper S2-VER: Semi-Supervised Visual Emotion Recognition☆11Updated 6 months ago
- Training A Small Emotional Vision Language Model for Visual Art Comprehension☆13Updated 3 months ago
- Hierarchical Video-Moment Retrieval and Step-Captioning (CVPR 2023)☆95Updated last year
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆41Updated 2 months ago
- ☆48Updated 3 months ago
- Explainable Multimodal Emotion Reasoning (EMER) and AffectGPT☆120Updated 6 months ago
- A PyTorch implementation of EmpiricalMVM☆39Updated 11 months ago
- [ACL2023] VSTAR is a multimodal dialogue dataset with scene and topic transition information☆12Updated 3 weeks ago
- Can I Trust Your Answer? Visually Grounded Video Question Answering (CVPR'24, Highlight)☆58Updated 4 months ago
- Official repository for the A-OKVQA dataset☆64Updated 6 months ago
- ☆72Updated 6 months ago
- [CVPR 2022] A large-scale public benchmark dataset for video question-answering, especially about evidence and commonsense reasoning. The…☆51Updated 4 months ago
- [NeurIPS 2022] Zero-Shot Video Question Answering via Frozen Bidirectional Language Models☆156Updated last year
- Official repository for "eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos"☆31Updated 5 months ago
- Repo for paper: "Paxion: Patching Action Knowledge in Video-Language Foundation Models" Neurips 23 Spotlight☆35Updated last year
- [NeurIPS 2023] Self-Chained Image-Language Model for Video Localization and Question Answering☆178Updated 10 months ago
- PyTorch code for “TVLT: Textless Vision-Language Transformer” (NeurIPS 2022 Oral)☆120Updated last year
- Official code for CVPR 2024 paper: Discriminative Probing and Tuning for Text-to-Image Generation☆25Updated 2 months ago
- [CVPR 2023] VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval☆38Updated last year
- ☆55Updated 6 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆43Updated 11 months ago
- Official implementation for "A Simple LLM Framework for Long-Range Video Question-Answering"☆85Updated 3 weeks ago
- [CVPR 2023] Code for "Learning Emotion Representations from Verbal and Nonverbal Communication"☆39Updated last year
- [CVPR 2023] Official code repository for "How you feelin'? Learning Emotions and Mental States in Movie Scenes". https://arxiv.org/abs/23…☆57Updated last month
- [CVPR 2024] Context-Guided Spatio-Temporal Video Grounding☆42Updated 4 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆89Updated last week
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆22Updated 4 months ago
- FunQA benchmarks funny, creative, and magic videos for challenging tasks including timestamp localization, video description, reasoning, …☆96Updated 4 months ago
- ☆30Updated 2 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆47Updated last year