Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆90May 8, 2026Updated last month
Alternatives and similar repositories for HERMES
Users that are interested in HERMES are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 11 months ago
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning☆36Jan 14, 2026Updated 5 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆60Jan 26, 2026Updated 4 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Apr 18, 2026Updated 2 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆19Aug 7, 2025Updated 10 months ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆57Jan 22, 2026Updated 4 months ago
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆30Dec 11, 2025Updated 6 months ago
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆91Apr 20, 2026Updated last month
- Official implementation of EgoThinker at NIPS 2025☆29Nov 25, 2025Updated 6 months ago
- 🔥🔥🔥 [Awesome] Latest Papers, Codes & Datasets on Streaming / Online Video Understanding — Building Always-on, Real-time Video AI 🤖☆290Jun 2, 2026Updated 2 weeks ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆70Jan 7, 2026Updated 5 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆25Oct 17, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- ☆45Jan 1, 2026Updated 5 months ago
- Official implementation of "URECA : Unique Region Caption Anything"☆58Jul 13, 2025Updated 11 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆53Sep 21, 2025Updated 8 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆63Oct 9, 2025Updated 8 months ago
- [ICLR 2026🔥] MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head☆151May 19, 2026Updated last month
- ☆20Jun 10, 2025Updated last year
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆25Oct 22, 2025Updated 7 months ago
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval☆119Nov 4, 2025Updated 7 months ago
- ☆55Jan 30, 2026Updated 4 months ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- ☆14Apr 23, 2025Updated last year
- Official resource for paper Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (ACL 20…☆16Aug 12, 2024Updated last year
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆21Aug 21, 2025Updated 9 months ago
- Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"☆19Sep 29, 2023Updated 2 years ago
- Official PyTorch Implementation for the "What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-mod…☆20Sep 26, 2024Updated last year
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆67Jun 8, 2026Updated last week
- Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"☆366Apr 17, 2026Updated 2 months ago
- ☆24Feb 18, 2025Updated last year
- This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.☆17Oct 20, 2025Updated 7 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- LLaVA-Next for STVG☆21Dec 5, 2025Updated 6 months ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆43Feb 5, 2025Updated last year
- [CVPR 2025] iSegMan: Interactive Segment-and-Manipulate 3D Gaussians 🔥🔥🔥☆23Mar 12, 2025Updated last year
- [AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers☆55Apr 18, 2026Updated 2 months ago
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆42Jan 27, 2026Updated 4 months ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated 2 years ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆71Apr 7, 2024Updated 2 years ago