FeiElysia/Tempo

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FeiElysia/Tempo)

FeiElysia / Tempo

Tempo: Small Vision-Language Models are Smart Compressors for Long Video Understanding, ECCV 2026

☆77

Alternatives and similar repositories for Tempo

Users that are interested in Tempo are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

zjr2000 / REVERIE
View on GitHub
[ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models
☆20Jul 17, 2024Updated 2 years ago
NIneeeeeem / LangDC
View on GitHub
[EMNLP 2025 Oral] Official codebase for Seeing More, Saying More: Lightweight Language Experts are Dynamic Video Token Compressors.
☆18Sep 7, 2025Updated 10 months ago
rootyJeon / Vision-aligned-Latent-Reasoning
View on GitHub
[ICML 2026] Official implementation of Vision-aligned Latent Reasoning for Multi-modal Large Language Model (VaLR)
☆20Apr 30, 2026Updated 2 months ago
THUMAI-Lab / LLaVA-UHD-v4
View on GitHub
☆47Jun 7, 2026Updated last month
EvolvingLMMs-Lab / SimpleStream
View on GitHub
A simple video streaming baseline that outperforms SOTAs.
☆151May 1, 2026Updated 2 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
IVUL-KAUST / VideoAuto-R1
View on GitHub
[CVPR2026] VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice
☆88Feb 27, 2026Updated 4 months ago
TencentARC / ARC-Chapter
View on GitHub
Structuring Hour-Long Videos into Navigable Chapters and Hierarchical Summaries
☆44Nov 19, 2025Updated 8 months ago
ubc-tea / FedSoup
View on GitHub
The official Pytorch implementation of paper "FedSoup: Improving Generalization and Personalization in Federated Learning via Selective M…
☆18Apr 14, 2024Updated 2 years ago
LJungang / Awesome-Video-Reasoning-Landscape
View on GitHub
🔥An open-source survey of the latest video reasoning tasks, paradigms, and benchmarks.
☆189Jun 14, 2026Updated last month
city1517 / FlexMem
View on GitHub
[CVPR2026 Highlight] FlexMem: Scaling the Long Video Understanding of MLLMs via Visual Memory Mechanism
☆29Apr 10, 2026Updated 3 months ago
marinero4972 / Open-o3-Video
View on GitHub
[ICML 2026] Official implementation of "Open-o3 Video: Grounded Video Reasoning with Explicit Spatio-Temporal Evidence"
☆157May 1, 2026Updated 2 months ago
ttgeng233 / UnAV
View on GitHub
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
☆73Jan 4, 2026Updated 6 months ago
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 5 months ago
TencentARC / TimeLens
View on GitHub
[CVPR 2026] TimeLens: Rethinking Video Temporal Grounding with Multimodal LLMs
☆162Updated this week
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
zjr2000 / SPES
View on GitHub
Official Implementation for paper "Pretraining A Large Language Model using Distributed GPUs: A Memory-Efficient Decentralized Paradigm"
☆23May 8, 2026Updated 2 months ago
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
snap-research / EgoEdit
View on GitHub
[CVPR 2026] 👋 Dataset and Benchmark code for EgoEdit
☆155Apr 5, 2026Updated 3 months ago
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
wbfwonderful / Vad-R1
View on GitHub
[NeurIPS 2025]Official repositories for "Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought".
☆31Jan 30, 2026Updated 5 months ago
NJU-LINK / MT-Video-Bench
View on GitHub
The Source Code for MT-Video-Bench @ ACL Findings 2026
☆22Jan 20, 2026Updated 6 months ago
OpenGVLab / VKnowU
View on GitHub
[ECCV 2026] VKnowU: Evaluating Visual Knowledge Understanding in Multimodal LLMs
☆16Feb 3, 2026Updated 5 months ago
Yioutpi / Awesome-3D-Understanding
View on GitHub
☆13Jul 22, 2024Updated 2 years ago
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
computer-net / MIT-6.S081-2020
View on GitHub
2020年 MIT-6.S801 实验代码仓库
☆18Jan 17, 2022Updated 4 years ago
zjr2000 / LLMVA-GEBC
View on GitHub
Winner solution to Generic Event Boundary Captioning task in LOVEU Challenge (CVPR 2023 workshop)
☆29Jan 1, 2024Updated 2 years ago
zjr2000 / GVL
View on GitHub
Official implementation for paper Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
☆28Dec 8, 2023Updated 2 years ago
zhang9302002 / ThinkingWithVideos
View on GitHub
The official code of "Thinking With Videos: Multimodal Tool-Augmented Reinforcement Learning for Long Video Reasoning"
☆102Oct 15, 2025Updated 9 months ago
groolegend / EgoSound
View on GitHub
Official release of 'EgoSound: Benchmarking Sound Understanding in Egocentric Videos[CVPR 2026 Highlight]'
☆52Apr 30, 2026Updated 2 months ago
zjr2000 / Awesome-Multimodal-Chatbot
View on GitHub
Awesome Multimodal Assistant is a curated list of multimodal chatbots/conversational assistants that utilize various modes of interaction…
☆79Jun 18, 2023Updated 3 years ago
daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 10 months ago
yaolinli / MLLM-Token-Compression
View on GitHub
Towards Efficient Multimodal Large Language Models: A Survey on Token Compression
☆212Jun 29, 2026Updated 3 weeks ago
knightyxp / VideoCoF
View on GitHub
[CVPR 2026 Highlight] VideoCoF: Unified Video Editing with Temporal Reasoner
☆204Jun 17, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
vivoCameraResearch / Anchor-Forcing
View on GitHub
[ECCV2026] Anchor Forcing is a cache-centric framework for interactive streaming video generation that preserves visual quality and cohe…
☆26May 4, 2026Updated 2 months ago
MAC-AutoML / QuoTA
View on GitHub
✨✨[AAAI 2026] This is the official implementation of our paper "QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Vi…
☆79Apr 28, 2025Updated last year
EvolvingLMMs-Lab / LongVT
View on GitHub
[CVPR 2026] LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
☆255Jun 24, 2026Updated last month
xiaoqian-shen / Vgent
View on GitHub
[NeurIPS 2025 Spotlight] Official PyTorch implementation of Vgent
☆48Nov 30, 2025Updated 7 months ago
SCZwangxiao / video-ReTaKe
View on GitHub
Official implementation of paper ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding
☆40Mar 16, 2025Updated last year
zihuixue / ProgCaptioner
View on GitHub
Code release for the paper "Progress-Aware Video Frame Captioning" (CVPR 2025)
☆26Jul 16, 2025Updated last year
MCG-NJU / StreamForest
View on GitHub
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆133Nov 4, 2025Updated 8 months ago