sming256/BOLT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/sming256/BOLT)

sming256 / BOLT

[CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding

☆55

Alternatives and similar repositories for BOLT

Users that are interested in BOLT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
MAC-AutoML / WFS-SB
View on GitHub
[CVPR 2026] Wavelet-based Frame Selection by Detecting Semantic Boundary for Long Video Understanding
☆32Apr 12, 2026Updated 3 months ago
NVlabs / FRAG
View on GitHub
☆15Apr 25, 2025Updated last year
xiaomi-research / q-frame
View on GitHub
[ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"
☆82Oct 25, 2025Updated 9 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
lern-to-write / STC
View on GitHub
[CVPR 2026] Accelerating Streaming Video Large Language Models via Hierarchical Token Compression
☆70Jun 8, 2026Updated last month
cokeshao / HoliTom
View on GitHub
[NeurIPS 2025] HoliTom: Holistic Token Merging for Fast Video Large Language Models
☆84Oct 10, 2025Updated 9 months ago
ShareLab-SII / FluxMem
View on GitHub
[CVPR 2026] FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding
☆73Mar 16, 2026Updated 4 months ago
HYUNJS / STTM
View on GitHub
[ICCV 2025] Multi-Granular Spatio-Temporal Token Merging for Training-Free Acceleration of Video LLMs
☆61Feb 2, 2026Updated 5 months ago
KD-TAO / DyCoke
View on GitHub
[CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
☆113Nov 22, 2025Updated 8 months ago
xuyang-liu16 / VidCom2
View on GitHub
[EMNLP 2025 Main] Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
☆127May 14, 2026Updated 2 months ago
MAC-AutoML / SpecEyes
View on GitHub
[ECCV 2026🔥] This is the official implementation of our paper "SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception…
☆62Apr 2, 2026Updated 3 months ago
Espere-1119-Song / Video-MMLU
View on GitHub
A Massive Multi-Discipline Lecture Understanding Benchmark
☆34Apr 20, 2026Updated 3 months ago
JoeLeelyf / OVO-Bench
View on GitHub
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆154Jul 24, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
yeliudev / VideoMind
View on GitHub
🧠 VideoMind: A Chain-of-LoRA Agent for Temporal-Grounded Video Reasoning (ICLR 2026)
☆350Feb 8, 2026Updated 5 months ago
ruili33 / TPO
View on GitHub
☆41Sep 9, 2025Updated 10 months ago
JIA-Lab-research / LSDBench
View on GitHub
A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…
☆28Aug 7, 2025Updated 11 months ago
mll-lab-nu / TStar
View on GitHub
TStar is a unified temporal search framework for long-form video question answering
☆97Mar 23, 2026Updated 4 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
yuanc3 / DATE
View on GitHub
Use 2 lines to empower absolute time awareness for Qwen2.5VL's MRoPE
☆29Sep 20, 2025Updated 10 months ago
MCG-NJU / StreamForest
View on GitHub
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆133Nov 4, 2025Updated 8 months ago
May2333 / FDCA
View on GitHub
[ICLR 2025] This repo is the official implementation of our paper "Learning Fine-Grained Representations through Textual Token Disentangl…
☆23Jul 28, 2025Updated 11 months ago
KHU-VLL / DEVIAS
View on GitHub
[ECCV 2024 Oral] Official implementation of the paper "DEVIAS: Learning Disentangled Video Representations of Action and Scene"
☆29Nov 15, 2025Updated 8 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
CG-Bench / CG-Bench
View on GitHub
☆20Jan 26, 2025Updated last year
gls0425 / LinVT
View on GitHub
LinVT: Empower Your Image-level Large Language Model to Understand Videos
☆84Dec 30, 2024Updated last year
Raphoo / linear-mech-vlms
View on GitHub
Code for "Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models"
☆15Feb 16, 2026Updated 5 months ago
cokeshao / Awesome-Multimodal-Token-Compression
View on GitHub
[TMLR 2026] Survey: https://arxiv.org/pdf/2507.20198
☆371May 29, 2026Updated last month
AIM-SKKU / QA-TIGER
View on GitHub
Question-Aware Gaussian Experts for Audio-Visual Question Answering -- Official Pytorch Implementation (CVPR'25, Highlight)
☆29Jun 6, 2025Updated last year
md-mohaiminul / BIMBA
View on GitHub
☆29Jul 25, 2025Updated last year
zhousheng97 / EgoTextVQA
View on GitHub
[CVPR'25] 🌟🌟 EgoTextVQA: Towards Egocentric Scene-Text Aware Video Question Answering
☆52Jun 19, 2025Updated last year
yaolinli / TimeChat-Online
View on GitHub
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆132Jun 29, 2026Updated 3 weeks ago
tychen-SJTU / MECD-Benchmark
View on GitHub
[NeurIPS'24 spotlight] MECD: Unlocking Multi-Event Causal Discovery in Video Reasoning. [TPAMI'25] MECD+
☆50Feb 11, 2026Updated 5 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
thu-nics / FrameFusion
View on GitHub
[ICCV'25] The official code of paper "Combining Similarity and Importance for Video Token Reduction on Large Visual Language Models"
☆76Jan 13, 2026Updated 6 months ago
EliSpectre / MM-Mem
View on GitHub
[ACL-26 (main)] From Verbatim to Gist Distilling Pyramidal Multimodal Memory via Semantic Information Bottleneck for Long-Horizon Video A…
☆39Apr 19, 2026Updated 3 months ago
Koreyoshi01 / VISD
View on GitHub
This repository is the official implementation for VISD.
☆22May 17, 2026Updated 2 months ago
Fanziyang-v / FlashVID
View on GitHub
[ICLR 2026 Oral] FlashVID: Efficient Video Large Language Models via Training-free Tree-based Spatiotemporal Token Merging
☆113Apr 30, 2026Updated 2 months ago
mit-han-lab / streaming-vlm
View on GitHub
StreamingVLM: Real-Time Understanding for Infinite Video Streams
☆1,047Oct 15, 2025Updated 9 months ago
worldbench / VideoLucy
View on GitHub
[NeurIPS 2025] Deep Memory Backtracking for Long Video Understanding
☆68Feb 10, 2026Updated 5 months ago