HumanOmni
☆217Mar 10, 2025Updated 11 months ago
Alternatives and similar repositories for HumanOmni
Users that are interested in HumanOmni are comparing it to the libraries listed below
Sorting:
- ☆997Mar 24, 2025Updated 11 months ago
- ☆22Jan 17, 2025Updated last year
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 8 months ago
- ☆149Jul 31, 2025Updated 7 months ago
- Enhancing Representations through Heterogeneous Self-Supervised Learning (TPAMI 2025)☆14May 2, 2025Updated 10 months ago
- EMER, OV-MER (ICML25), AffectGPT (ICML25, Oral), EmoPrefer (ICLR26)☆337Feb 24, 2026Updated last week
- Collect papers about Mamba (a selective state space model).☆14Aug 6, 2024Updated last year
- Offical implementation of "Re-Aligning Language to Visual Objects with an Agentic Workflow"☆31Apr 20, 2025Updated 10 months ago
- Toolkits for Multimodal Emotion Recognition☆286May 17, 2025Updated 9 months ago
- Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning☆525Nov 17, 2025Updated 3 months ago
- [ACM ICMR'25]Official repository for "eMotions: A Large-Scale Dataset for Emotion Recognition in Short Videos"☆37Jul 21, 2025Updated 7 months ago
- Frontier Multimodal Foundation Models for Image and Video Understanding☆1,109Aug 14, 2025Updated 6 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆41Jan 26, 2026Updated last month
- Exploring Feature Self-relation for Self-supervised Transformer (TPAMI 2023)☆21Apr 30, 2025Updated 10 months ago
- Ola: Pushing the Frontiers of Omni-Modal Language Model☆385Jun 13, 2025Updated 8 months ago
- Witness the aha moment of VLM with less than $3.☆4,036May 19, 2025Updated 9 months ago
- GPT-4V with Emotion☆96Dec 8, 2023Updated 2 years ago
- Code for the paper "Joint Co-Speech Gesture and Expressive Talking Face Generation using Diffusion with Adapters"☆24Jan 7, 2025Updated last year
- [ICML'25 Spotlight] Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models☆46Jan 21, 2026Updated last month
- EmoCapCLIP: Learning Transferable Facial Emotion Representations from Large-Scale Semantically Rich Captions☆20Jul 29, 2025Updated 7 months ago
- ☆21Jun 16, 2025Updated 8 months ago
- ☆14Feb 22, 2025Updated last year
- A new dataset adapted from DeformThings4D for non-isometric shape matching task☆29Apr 24, 2023Updated 2 years ago
- Awesome-Emotion-Reasoning is a collection of Emotion-Reasoning works, including papers, codes and datasets☆77Dec 16, 2025Updated 2 months ago
- Open3D-based implementation of DynamicFusion (2015), python implementation of NeurIPS 2020 paper Neural Non-Rigid Tracking,.☆28Jul 24, 2023Updated 2 years ago
- ☆11Jan 8, 2025Updated last year
- ☆21Feb 13, 2026Updated 2 weeks ago
- ☆18May 23, 2025Updated 9 months ago
- [ Arxiv 2023 ] This repository contains the code for "MUPPET: Multi-Modal Few-Shot Temporal Action Detection"☆15Aug 30, 2023Updated 2 years ago
- ☆13Sep 26, 2025Updated 5 months ago
- [CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts☆17Apr 2, 2025Updated 11 months ago
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆83Feb 27, 2025Updated last year
- ☆27Apr 29, 2025Updated 10 months ago
- ☆37May 28, 2025Updated 9 months ago
- ☆16Sep 25, 2025Updated 5 months ago
- ☆12Sep 15, 2024Updated last year
- ☆10Jul 14, 2023Updated 2 years ago
- ☆17Jun 26, 2025Updated 8 months ago
- [ICLR2026] VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling☆511Nov 18, 2025Updated 3 months ago