xiaomi-research/q-frame

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/xiaomi-research/q-frame)

xiaomi-research / q-frame

[ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"

☆82

Alternatives and similar repositories for q-frame

Users that are interested in q-frame are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

xiaomi-research / btl-ui
View on GitHub
[NeurIPS 2025] Implementation of the paper "BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent"
☆19Nov 27, 2025Updated 8 months ago
chaoqi7 / BSA-CIL-3D
View on GitHub
Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-Training
☆13Nov 30, 2024Updated last year
sming256 / BOLT
View on GitHub
[CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding
☆55Feb 5, 2026Updated 5 months ago
ncTimTang / AKS
View on GitHub
[CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding
☆228Dec 19, 2025Updated 7 months ago
Darwin-Agent / awesome-world-models-for-digital-agents
View on GitHub
Digital Agents Meet World Models: A Survey
☆50May 8, 2026Updated 2 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
hyungjin-chung / VPS
View on GitHub
☆16Sep 11, 2025Updated 10 months ago
buptlihang / CVLM
View on GitHub
☆23Jan 8, 2024Updated 2 years ago
XenoZLH / Shuffle-R1
View on GitHub
Official code repository of Shuffle-R1
☆26Feb 23, 2026Updated 5 months ago
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
SeerRay-Lab / Xiaomi-GUI-0
View on GitHub
[Technical Report] An End-to-End Multimodal GUI Agent for Real Mobile Environments
☆80Updated this week
lcqysl / FrameThinker
View on GitHub
[ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"
☆50Oct 9, 2025Updated 9 months ago
xiaomi-research / timeviper
View on GitHub
[CVPR'26] TimeViper: A Hybrid Mamba-Transformer Vision-Language Model for Efficient Long Video Understanding
☆25Jan 4, 2026Updated 6 months ago
yaolinli / GenS
View on GitHub
[ACL 2025 Findings] GenS: Generative Frame Sampler for Long Video Understanding
☆22Aug 21, 2025Updated 11 months ago
XiaoMi / xiaomi-mimo-vl-miloco
View on GitHub
Xiaomi MiMo-VL-Miloco
☆223Dec 23, 2025Updated 7 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
zai-org / MotionBench
View on GitHub
Official code for MotionBench (CVPR 2025)
☆76Mar 3, 2025Updated last year
zhshj0110 / SiT-MLP
View on GitHub
[TCSVT 2024] Implementation of the paper "SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recog…
☆19Apr 10, 2024Updated 2 years ago
yunzhuzhang0918 / flexselect
View on GitHub
The official repository for paper "FlexSelect: Flexible Token Selection for Efficient Long Video Understanding".
☆31Sep 19, 2025Updated 10 months ago
DYZhang09 / ViTWSS3D
View on GitHub
[ICCV 23] A Simple Vision Transformer for Weakly Semi-supervised 3D Object Detection
☆13Apr 12, 2024Updated 2 years ago
ttgeng233 / UniAV
View on GitHub
Unified Audio-Visual Perception for Multi-Task Video Localization
☆33Apr 19, 2024Updated 2 years ago
xiaomi-research / dasheng-tokenizer
View on GitHub
State-of-the-art continious audio tokenization
☆40Mar 9, 2026Updated 4 months ago
zhshj0110 / Awesome-Motion-Diffusion-Models
View on GitHub
A collection of resources and papers on Motion Diffusion Models.
☆39Jun 10, 2025Updated last year
FAVOR-Bench / FAVOR-Bench
View on GitHub
Accepted By The 39th Annual Conference on Neural Information Processing Systems Datasets and Benchmarks Track
☆25Nov 17, 2025Updated 8 months ago
mll-lab-nu / TStar
View on GitHub
TStar is a unified temporal search framework for long-form video question answering
☆97Mar 23, 2026Updated 4 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
EvolvingLMMs-Lab / SimpleStream
View on GitHub
A simple video streaming baseline that outperforms SOTAs.
☆153May 1, 2026Updated 2 months ago
OpenGVLab / VRBench
View on GitHub
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
☆28Jun 4, 2026Updated last month
VisionXLab / Moment-Video
View on GitHub
☆19Jun 2, 2026Updated last month
xuyang-liu16 / GlobalCom2
View on GitHub
[AAAI 2026] Global Compression Commander: Plug-and-Play Inference Acceleration for High-Resolution Large Vision-Language Models
☆42Jan 27, 2026Updated 6 months ago
MCG-NJU / StreamForest
View on GitHub
[NeurIPS 2025 Spotlight] StreamForest: Efficient Online Video Understanding with Persistent Event Memory
☆133Nov 4, 2025Updated 8 months ago
hmyao22 / DADF
View on GitHub
The official implementation of the paper DADF for industrial VAD
☆13Dec 1, 2023Updated 2 years ago
xuyang-liu16 / MixKV
View on GitHub
[ICLR 2026] Mixing Importance with Diversity: Joint Optimization for KV Cache Compression in Large Vision-Language Models
☆29Mar 21, 2026Updated 4 months ago
qirui-chen / MultiHop-EgoQA
View on GitHub
[AAAI 2025] Grounded Multi-Hop VideoQA in Long-Form Egocentric Videos
☆38May 27, 2025Updated last year
cvpr-vand / challenge
View on GitHub
Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR
☆14Jul 21, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
NVlabs / VideoITG
View on GitHub
[CVPR 2026 Highlight] VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding
☆126Apr 17, 2026Updated 3 months ago
Becomebright / ReKV
View on GitHub
[ICLR'25] Streaming Video Question-Answering with In-context Video KV-Cache Retrieval
☆122Nov 4, 2025Updated 8 months ago
H-EmbodVis / HERMESV2
View on GitHub
HERMES++: Toward a Unified Driving World Model for 3D Scene Understanding and Generation
☆66Updated this week
tulerfeng / Video-R1
View on GitHub
Video-R1: Reinforcing Video Reasoning in MLLMs [🔥the first paper to explore R1 for video]
☆884Dec 14, 2025Updated 7 months ago
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated last year
yaolinli / TimeChat-Online
View on GitHub
[ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
☆132Jun 29, 2026Updated last month
pro-assist / ProAssist
View on GitHub
☆20Jul 21, 2025Updated last year