wgcyeo/WorldMM

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/wgcyeo/WorldMM)

wgcyeo / WorldMM

[CVPR 2026 Highlight] WorldMM: Dynamic Multimodal Memory Agent for Long Video Reasoning

☆97

Alternatives and similar repositories for WorldMM

Users that are interested in WorldMM are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiaoqian-shen / Vgent
View on GitHub
[NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent
☆48Nov 30, 2025Updated 7 months ago
facebookresearch / egagent
View on GitHub
Code for "Agentic Very Long Video Understanding" (EGAgent) [ACL 2026 Main]
☆49Jul 1, 2026Updated 3 weeks ago
ShareLab-SII / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆73Mar 16, 2026Updated 4 months ago
EliSpectre / MM-Mem
View on GitHub
[ACL-26 (main)] From Verbatim to Gist Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video A…
☆39Apr 19, 2026Updated 3 months ago
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ByteDance-Seed / m3-agent
View on GitHub
☆1,423Feb 12, 2026Updated 5 months ago
cg1177 / Recursive-Multimodal-Agent
View on GitHub
☆19Jul 1, 2026Updated 3 weeks ago
MILVLG / videoarm
View on GitHub
☆27Apr 9, 2026Updated 3 months ago
Sid2697 / HOI-Ref
View on GitHub
Code implementation for paper titled "HOI-Ref: Hand-Object Interaction Referral in Egocentric Vision"
☆30Apr 16, 2024Updated 2 years ago
jylins / videoseek
View on GitHub
[CVPR 2026] VideoSeek: Long-Horizon Video Agent with Tool-Guided Seeking
☆64Mar 23, 2026Updated 4 months ago
microsoft / DeepVideoDiscovery
View on GitHub
**Deep Video Discovery (DVD)** is a deep-research style question answering agent designed for understanding extra-long videos.
☆403Nov 3, 2025Updated 8 months ago
EvolvingLMMs-Lab / SimpleStream
View on GitHub
A simple video streaming baseline that outperforms SOTAs.
☆151May 1, 2026Updated 2 months ago
worldbench / VideoLucy
View on GitHub
[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding
☆68Feb 10, 2026Updated 5 months ago
bethgelab / supersanity
View on GitHub
A critical analysis of the Cambrian-S model and VSI-Super benchmarks
☆16Nov 20, 2025Updated 8 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mll-lab-nu / TStar
View on GitHub
TStar is a unified temporal search framework for long-form video question answering
☆97Mar 23, 2026Updated 4 months ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆122Nov 4, 2025Updated 8 months ago
lern-to-write / STC
View on GitHub
[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
☆70Jun 8, 2026Updated last month
Haiyang0226 / Symphony
View on GitHub
code of cvpr26 paper Symphony
☆17Apr 7, 2026Updated 3 months ago
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
EvolvingLMMs-Lab / EgoLife
View on GitHub
[CVPR 2025] EgoLife: Towards Egocentric Life Assistant
☆450Mar 19, 2025Updated last year
jylins / hourllava
View on GitHub
[NeurIPS 2025 Spotlight] Unleashing Hour-Scale Video Training for Long Video-Language Understanding
☆19Jun 24, 2025Updated last year
SooLab / EyeWO
View on GitHub
[NeurIPS2025] The official PyTorch implementation of the "Eyes Wide Open: Ego Proactive Video-LLM for Streaming Video".
☆34Dec 25, 2025Updated 6 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
dibschat / ProVideLLM
View on GitHub
[ICCV 2025] Streaming VideoLLMs for Real-time Procedural Video Understanding
☆18Oct 26, 2025Updated 8 months ago
xinyouu / V-CAST
View on GitHub
V-CAST: Video Curvature-Aware Spatio-Temporal Pruning for Efficient Video Large Language Models
☆34Apr 16, 2026Updated 3 months ago
64327069 / LVAgent
View on GitHub
Code of LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents
☆39Nov 24, 2025Updated 8 months ago
TeleAI-UAGI / TeleEgo
View on GitHub
The official repo of TeleEgo - A Benchmark for Egocentric AI Assistants.
☆63Jul 7, 2026Updated 2 weeks ago
mbzuai-oryx / LongShOT
View on GitHub
A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos
☆21Jun 20, 2026Updated last month
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆180Mar 23, 2025Updated last year
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
hmxiong / StreamChat
View on GitHub
Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025
☆111Mar 14, 2025Updated last year
sail-sg / Video-Next-Event-Prediction
View on GitHub
☆28Aug 9, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
xuyang-liu16 / GlobalCom2
View on GitHub
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆42Jan 27, 2026Updated 5 months ago
LunarShen / FastVID
View on GitHub
[NeurIPS 2025] FastVID: Dynamic Density Pruning for Fast Video Large Language Models
☆37Nov 10, 2025Updated 8 months ago
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆348Feb 8, 2026Updated 5 months ago
zjuruizhechen / Awesome-Video-Agent
View on GitHub
A collection of awesome think with videos papers.
☆100Dec 1, 2025Updated 7 months ago
Leon1207 / Video-RAG-master
View on GitHub
✨✨[NeurIPS 2025] This is the official implementation of our paper "Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehensi…
☆446Jun 26, 2026Updated 3 weeks ago
cokeshao / HoliTom
View on GitHub
[NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models
☆84Oct 10, 2025Updated 9 months ago
zsgvivo / VideoZoomer
View on GitHub
☆34Feb 12, 2026Updated 5 months ago