Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding" [ACL 2026]
☆69Apr 15, 2026Updated this week
Alternatives and similar repositories for HERMES
Users that are interested in HERMES are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 9 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆33Feb 22, 2026Updated last month
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Jan 26, 2026Updated 2 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 3 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated 11 months ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [Awesome] 🔥🔥🔥 Latest Papers, Codes and Datasets on Streaming / Online Video Understanding☆207Updated this week
- ☆18Aug 7, 2025Updated 8 months ago
- Cross-Self KV Cache Pruning for Efficient Vision-Language Inference☆10Dec 15, 2024Updated last year
- [CVPR 2026] OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆73Apr 8, 2026Updated last week
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆56Jan 22, 2026Updated 2 months ago
- [ICLR 2025] Data-Augmented Phrase-Level Alignment for Mitigating Object Hallucination☆20Jan 27, 2025Updated last year
- 🔥🔥[NeurIPS2025]Exploring and mitigating semantic hallucinations in scene text perception and reasoning☆28Dec 11, 2025Updated 4 months ago
- [ICLR 2026] MMDuet2: Enhancing Proactive Interaction of Video MLLMs with Multi-Turn Reinforcement Learning☆25Jan 14, 2026Updated 3 months ago
- Official implementation of EgoThinker at NIPS 2025☆25Nov 25, 2025Updated 4 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆63Jan 7, 2026Updated 3 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- Official implementation of "URECA : Unique Region Caption Anything"☆57Jul 13, 2025Updated 9 months ago
- [ICCV 2025] Official code for "AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning"☆59Oct 9, 2025Updated 6 months ago
- [NeurIPS'25] ReAgent-V: A Reward-Driven Multi-Agent Framework for Video Understanding☆51Sep 21, 2025Updated 6 months ago
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head (ICLR 2026)☆141Feb 6, 2026Updated 2 months ago
- ☆19Jun 10, 2025Updated 10 months ago
- Code and data for paper "Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation".☆24Oct 22, 2025Updated 5 months ago
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval☆114Nov 4, 2025Updated 5 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Official implementation of "OpenCity3D: What do Vision-Language Models know about Urban Environments?" @ WACV2025☆18Nov 24, 2024Updated last year
- Official resource for paper Investigating and Mitigating the Multimodal Hallucination Snowballing in Large Vision-Language Models (ACL 20…☆15Aug 12, 2024Updated last year
- ☆54Jan 30, 2026Updated 2 months ago
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆21Aug 21, 2025Updated 7 months ago
- ☆15Nov 1, 2024Updated last year
- ☆15Dec 20, 2024Updated last year
- Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"☆19Sep 29, 2023Updated 2 years ago
- Official PyTorch Implementation for the "What if...?: Thinking Counterfactual Keywords Helps to Mitigate Hallucination in Large Multi-mod…☆20Sep 26, 2024Updated last year
- Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"☆341Jan 6, 2026Updated 3 months ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- LLaVA-Next for STVG☆19Dec 5, 2025Updated 4 months ago
- ☆24Feb 18, 2025Updated last year
- This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.☆16Oct 20, 2025Updated 5 months ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆44Feb 5, 2025Updated last year
- [CVPR 2025] iSegMan: Interactive Segment-and-Manipulate 3D Gaussians 🔥🔥🔥☆23Mar 12, 2025Updated last year
- [AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models☆42Jan 27, 2026Updated 2 months ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated 2 years ago