haowei-freesky / HERMESLinks
Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"
☆38Updated this week
Alternatives and similar repositories for HERMES
Users that are interested in HERMES are comparing it to the libraries listed below
Sorting:
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆114Updated 2 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆86Updated 6 months ago
- Official release of "Spatial-SSRL: Enhancing Spatial Understanding via Self-Supervised Reinforcement Learning"☆108Updated last month
- A collection of awesome think with videos papers.☆83Updated last month
- Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"☆127Updated last month
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Updated last year
- Incentivizing "Thinking with Long Videos" via Native Tool Calling☆183Updated this week
- Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆58Updated 7 months ago
- ☆96Updated 7 months ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated this week
- ☆41Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- ☆27Updated 9 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆89Updated 6 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆41Updated 5 months ago
- Official implement of MIA-DPO☆70Updated last year
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆72Updated 2 months ago
- ☆65Updated 2 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- ☆63Updated 6 months ago
- VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆55Updated 2 weeks ago
- The code repository of UniRL☆50Updated 7 months ago
- ☆39Updated 8 months ago
- E-GRPO: High Entropy Steps Drive Effective Reinforcement Learning for Flow Models☆24Updated 3 weeks ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆267Updated 2 months ago
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆34Updated 7 months ago
- This is the offical repository of InfiniteVL☆76Updated last month
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆47Updated last month
- [ICLR'26] Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology☆72Updated this week
- [AAAI 26 Demo] Offical repo for CAT-V - Caption Anything in Video: Object-centric Dense Video Captioning with Spatiotemporal Multimodal P…☆63Updated 2 months ago