[CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
☆38Feb 5, 2026Updated last month
Alternatives and similar repositories for BOLT
Users that are interested in BOLT are comparing it to the libraries listed below
Sorting:
- TStar is a unified temporal search framework for long-form video question answering☆88Sep 2, 2025Updated 6 months ago
- ☆17Jan 26, 2025Updated last year
- A benchmark that focuses on the sampling dilemma in long-video tasks. Through well-designed tasks, it evaluates the sampling efficiency o…☆27Aug 7, 2025Updated 6 months ago
- Official repository of the video reasoning benchmark MMR-V. Can Your MLLMs "Think with Video"?☆38Jun 23, 2025Updated 8 months ago
- [Arxiv 2025] ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions☆45Jun 11, 2025Updated 8 months ago
- This is the official implementation of ReVisionLLM: Recursive Vision-Language Model for Temporal Grounding in Hour-Long Videos☆43Nov 5, 2025Updated 3 months ago
- ✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models☆43Apr 10, 2025Updated 10 months ago
- (ICCV2025) Official repository of paper "ViSpeak: Visual Instruction Feedback in Streaming Videos"☆45Jul 1, 2025Updated 8 months ago
- [ICCV 2025 Oral] Official implementation of Learning Streaming Video Representation via Multitask Training.☆84Dec 24, 2025Updated 2 months ago
- [CVPR 2024] LoSh: Long-Short Text Joint Prediction Network for Referring Video Object Segmentation☆13Jun 17, 2024Updated last year
- 基于pdfium的pdf/ofd双引擎解析渲染引擎☆14Oct 15, 2024Updated last year
- ☆14Aug 28, 2024Updated last year
- 在监控画质下实现对校园自行车的重识别,包含REID模型识别,向量数据库检索,UI展示☆10Feb 13, 2024Updated 2 years ago
- Official Implementation of DART (DART: Diffusion-Inspired Speculative Decoding for Fast LLM Inference).☆44Feb 8, 2026Updated 3 weeks ago
- Implementation for "StyleGAN-Canvas: Augmenting StyleGAN3 for Real-Time Human-AI Co-Creation"☆12May 24, 2023Updated 2 years ago
- Official code for paper: F3D-Gaus: Feed-forward 3D-aware Generation on ImageNet with Cycle-Aggregative Gaussian Splatting☆50Mar 11, 2025Updated 11 months ago
- [ACM-MM 2025 Workshop] More Is Better: A MoE-Based Emotion Recognition Framework with Human Preference Alignment.☆25Nov 25, 2025Updated 3 months ago
- Source codes for the paper "Personalized Dynamic Music Emotion Recognition with Dual-Scale Attention-Based Meta-Learning" (PDMER) which p…☆14Mar 24, 2025Updated 11 months ago
- Quantized Attention on GPU☆44Nov 22, 2024Updated last year
- 大数据生态解决方案基础平台: 搜索系统、公共系统、任务管理系统、数据binlog采集、基础爬虫系统、数据传输系统、运维告警系统、APM、报表系统☆11Jan 25, 2021Updated 5 years ago
- 用于自动预约民政局婚姻登记处的号,限广东省民政局☆10Jun 25, 2023Updated 2 years ago
- ☆10Mar 31, 2025Updated 11 months ago
- This project is a demonstration of a content-based recommendation system for Spotify that leverages user's preferences and audio features…☆17Apr 4, 2023Updated 2 years ago
- Implementation of SoundtStream from the paper: "SoundStream: An End-to-End Neural Audio Codec"☆13Jan 27, 2025Updated last year
- [ECCV 2024] The first zero-shot setting for spatio-temporal video grounding.☆11Jul 16, 2024Updated last year
- EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation☆27Jul 30, 2025Updated 7 months ago
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆181Dec 19, 2025Updated 2 months ago
- ☆16Sep 1, 2025Updated 6 months ago
- LongCTR: A Long Sequence Modeling Benchmark for CTR Prediction☆17Jun 21, 2025Updated 8 months ago
- Web app for makeup transfer using Stable Diffusion☆10Sep 11, 2023Updated 2 years ago
- Official Implementation of DMT: Dual Mean-Teacher in PyTorch.☆10Oct 27, 2023Updated 2 years ago
- UnicEdit-10M and UnicBench project☆23Updated this week
- Official repository of paper "LOVE-R1: Advancing Long Video Understanding with Adaptive Zoom-in Mechanism via Multi-Step Reasoning"☆20Nov 1, 2025Updated 4 months ago
- ☆13Aug 7, 2025Updated 6 months ago
- TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs☆23Sep 21, 2025Updated 5 months ago
- [ICCV 2025] Official repository of the paper "BlinkTrack: Feature Tracking over 80 FPS via Events and Images". This repository contains t…☆11Aug 27, 2025Updated 6 months ago
- [CVPR'25] Official code of paper "Mimic In-Context Learning for Multimodal Tasks"☆24Jun 8, 2025Updated 8 months ago
- ☆11Jul 2, 2022Updated 3 years ago
- ☆10Feb 22, 2023Updated 3 years ago