Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆83May 8, 2026Updated this week
Alternatives and similar repositories for HERMES
Users that are interested in HERMES are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 10 months ago
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning☆31Jan 14, 2026Updated 3 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆33Feb 22, 2026Updated 2 months ago
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Jan 26, 2026Updated 3 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- [Awesome] 🔥🔥🔥 Latest Papers, Codes and Datasets on Streaming / Online Video Understanding☆228Updated this week
- ☆18Aug 7, 2025Updated 9 months ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆78Apr 20, 2026Updated 2 weeks ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆57Jan 22, 2026Updated 3 months ago
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆28Dec 11, 2025Updated 4 months ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆64Jan 7, 2026Updated 4 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆25Oct 17, 2024Updated last year
- ☆81Apr 28, 2026Updated last week
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- Official implementation of "URECA : Unique Region Caption Anything"☆57Jul 13, 2025Updated 9 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆52Sep 21, 2025Updated 7 months ago
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head (ICLR 2026)☆144Apr 17, 2026Updated 3 weeks ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆59Oct 9, 2025Updated 7 months ago
- ☆20Jun 10, 2025Updated 10 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆24Oct 22, 2025Updated 6 months ago
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval☆114Nov 4, 2025Updated 6 months ago
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆17Nov 24, 2024Updated last year
- Official resource for paper Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (ACL 20…☆15Aug 12, 2024Updated last year
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- ☆53Jan 30, 2026Updated 3 months ago
- ☆13Apr 23, 2025Updated last year
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆21Aug 21, 2025Updated 8 months ago
- Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"☆19Sep 29, 2023Updated 2 years ago
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆63Feb 25, 2026Updated 2 months ago
- Official PyTorch Implementation for the "What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-mod…☆20Sep 26, 2024Updated last year
- LLaVA-Next for STVG☆19Dec 5, 2025Updated 5 months ago
- ☆24Feb 18, 2025Updated last year
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆44Feb 5, 2025Updated last year
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- (IJCV 2023) Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"☆13Mar 20, 2025Updated last year
- [CVPR 2025] iSegMan: Interactive Segment-and-Manipulate 3D Gaussians 🔥🔥🔥☆23Mar 12, 2025Updated last year
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆42Jan 27, 2026Updated 3 months ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated 2 years ago
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆71Apr 7, 2024Updated 2 years ago
- [NeurIPS'25] Time-R1: Post-Training Large Vision Language Model for Temporal Video Grounding☆91Dec 14, 2025Updated 4 months ago
- [ICCV 2025] AdsQA: Towards Advertisement Video Understanding Arxiv: https://arxiv.org/abs/2509.08621☆34Oct 30, 2025Updated 6 months ago