Official Repository for paper "HERMES: KV Cache as Hierarchical Memory for Efficient Streaming Video Understanding"
☆63Mar 23, 2026Updated this week
Alternatives and similar repositories for HERMES
Users that are interested in HERMES are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆12Jun 28, 2025Updated 9 months ago
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last month
- [Awesome] 🔥🔥🔥 Latest Papers, Codes and Datasets on Streaming / Online Video Understanding☆150Jan 13, 2026Updated 2 months ago
- [ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models☆24Jan 1, 2026Updated 2 months ago
- SpaceVLLM: Endowing Multimodal Large Language Model with Spatio-Temporal Video Grounding Capability☆17May 8, 2025Updated 10 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆18Aug 7, 2025Updated 7 months ago
- OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models☆66Mar 20, 2026Updated last week
- Official implementation of EgoThinker at NIPS 2025☆25Nov 25, 2025Updated 4 months ago
- ☆42Jan 1, 2026Updated 2 months ago
- MHLA: Restoring Expressivity of Linear Attention via Token-Level Multi-Head (ICLR 2026)☆136Feb 6, 2026Updated last month
- [ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval☆109Nov 4, 2025Updated 4 months ago
- ☆58Jan 30, 2026Updated last month
- (ICLR 2026)Official repository of 'ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing’☆59Jan 26, 2026Updated 2 months ago
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆55Mar 15, 2026Updated 2 weeks ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]☆21Aug 21, 2025Updated 7 months ago
- ☆15Nov 1, 2024Updated last year
- ☆15Dec 20, 2024Updated last year
- Official repo for "DynaMITe: Dynamic Query Bootstrapping for Multi-object Interactive Segmentation Transformer"☆19Sep 29, 2023Updated 2 years ago
- [AAAI 2026] SlideTailor: Personalized Presentation Slide Generation for Scientific Papers☆48Jan 1, 2026Updated 2 months ago
- [CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression☆51Feb 25, 2026Updated last month
- Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"☆335Jan 6, 2026Updated 2 months ago
- LLaVA-Next for STVG☆18Dec 5, 2025Updated 3 months ago
- ☆24Feb 18, 2025Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- This repository contains the code for the paper “Neuro-Symbolic Query Compiler”, accepted to the Findings of ACL 2025.☆16Oct 20, 2025Updated 5 months ago
- Official implementation of paper VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interact…☆43Feb 5, 2025Updated last year
- (IJCV 2023) Offical implementation of "SCT: A Simple Baseline for Parameter-Efficient Fine-Tuning via Salient Channels"☆13Mar 20, 2025Updated last year
- [CVPR 2025] iSegMan: Interactive Segment-and-Manipulate 3D Gaussians 🔥🔥🔥☆24Mar 12, 2025Updated last year
- Medical SAM3: A Foundation Model for Universal Prompt-Driven Medical Image Segmentation☆115Jan 20, 2026Updated 2 months ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆56Jan 22, 2026Updated 2 months ago
- The official repo for "Where do Large Vision-Language Models Look at when Answering Questions?"☆61Jan 7, 2026Updated 2 months ago
- Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"☆29Apr 16, 2024Updated last year
- Official Repo for PosSAM: Panoptic Open-vocabulary Segment Anything☆70Apr 7, 2024Updated last year
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆44Mar 6, 2026Updated 3 weeks ago
- [ICCV 2025] AdsQA: Towards Advertisement Video Understanding Arxiv: https://arxiv.org/abs/2509.08621☆34Oct 30, 2025Updated 5 months ago
- 洛谷 API 文档☆14Nov 15, 2025Updated 4 months ago
- [ICCV 2025] StreamMind: Unlocking Full Frame Rate Streaming Video Dialogue through Event-Gated Cognition☆63Jun 25, 2025Updated 9 months ago
- Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision☆43Oct 19, 2025Updated 5 months ago
- Aligning Agentic World Models via Knowledgeable Experience Learning☆32Jan 25, 2026Updated 2 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year