fuyyyyy / SEPMLinks
[ICML'25 Spotlight] Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models
☆20Updated this week
Alternatives and similar repositories for SEPM
Users that are interested in SEPM are comparing it to the libraries listed below
Sorting:
- ☆23Updated 4 months ago
- [CVPR'25] EMOE: Modality-Specific Enhanced Dynamic Emotion Experts☆54Updated last month
- An official implementation of "Incomplete Multimodality-Diffused Emotion Recognition" in PyTorch. (NeurIPS 2023)☆55Updated last year
- [ECCV’24] Official Implementation for CAT: Enhancing Multimodal Large Language Model to Answer Questions in Dynamic Audio-Visual Scenario…☆55Updated 11 months ago
- [CVPR25 Highlight] A ChatGPT-Prompted Visual hallucination Evaluation Dataset, featuring over 100,000 data samples and four advanced eval…☆21Updated 4 months ago
- Video Chain of Thought, Codes for ICML 2024 paper: "Video-of-Thought: Step-by-Step Video Reasoning from Perception to Cognition"☆159Updated 6 months ago
- Official Repository for "Learning Trimodal Relation for Audio-Visual Question Answering with Missing Modality" (ECCV 2024)☆13Updated 10 months ago
- TCL-MAP is a powerful method for multimodal intent recognition (AAAI 2024)☆45Updated last year
- [CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Att…☆36Updated 6 months ago
- [ICML 2025] Official implementation of paper 'Look Twice Before You Answer: Memory-Space Visual Retracing for Hallucination Mitigation in…☆145Updated last month
- ☆28Updated 2 months ago
- A python implement for Certifiable Robust Multi-modal Training☆18Updated 2 months ago
- ☆31Updated last month
- Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation☆31Updated 5 months ago
- Training A Small Emotional Vision Language Model for Visual Art Comprehension☆16Updated last year
- ☆17Updated 4 months ago
- ☆21Updated 7 months ago
- [ICLR 2025] MLLM can see? Dynamic Correction Decoding for Hallucination Mitigation☆98Updated 8 months ago
- HallE-Control: Controlling Object Hallucination in LMMs☆31Updated last year
- [CVPR 2025] Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation☆65Updated 2 months ago
- [ICML 2024] Official implementation for "HALC: Object Hallucination Reduction via Adaptive Focal-Contrast Decoding"☆95Updated 8 months ago
- Official Implementation of CODE☆15Updated 11 months ago
- The repo for "On-the-fly Modulation for Balanced Multimodal Learning", T-PAMI 2024☆17Updated 11 months ago
- LongVALE: Vision-Audio-Language-Event Benchmark Towards Time-Aware Omni-Modal Perception of Long Videos. (CVPR 2025))☆47Updated 2 months ago
- [ICCV 2025] ONLY: One-Layer Intervention Sufficiently Mitigates Hallucinations in Large Vision-Language Models☆33Updated last month
- Code for paper: Nullu: Mitigating Object Hallucinations in Large Vision-Language Models via HalluSpace Projection☆39Updated 5 months ago
- [CVPR 2024] Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension☆55Updated last year
- [ICML 2024] "Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models"☆55Updated 11 months ago
- 🚀 Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆31Updated last month
- TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models☆16Updated 7 months ago