[ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"
☆74Oct 25, 2025Updated 5 months ago
Alternatives and similar repositories for q-frame
Users that are interested in q-frame are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Boosting the Class-Incremental Learning in 3D Point Clouds via Zero-Collection-Cost Basic Shape Pre-Training☆13Nov 30, 2024Updated last year
- [CVPR2026] VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding☆112Mar 26, 2026Updated 3 weeks ago
- Source Code for Captionomaly: A Deep Learning Toolbox for Anomaly Captioning in Surveillance Videos☆13Jun 26, 2023Updated 2 years ago
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆76Jun 26, 2025Updated 9 months ago
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆29Dec 22, 2025Updated 3 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- [TCSVT 2024] Implementation of the paper "SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recog…☆19Apr 10, 2024Updated 2 years ago
- ☆23Jan 8, 2024Updated 2 years ago
- ☆31Nov 1, 2023Updated 2 years ago
- ☆15Jul 9, 2019Updated 6 years ago
- SurgLaVi: Official repository☆30Mar 4, 2026Updated last month
- [ICLR 2026] The official implementation of "Dichotomous Diffusion Policy Optimization"☆28Mar 6, 2026Updated last month
- [AAAI'26] Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augm…☆11Dec 5, 2025Updated 4 months ago
- [ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring☆25Aug 8, 2025Updated 8 months ago
- Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR☆13Jul 21, 2025Updated 8 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Inception-I3D, Non Local finetune, hmdb51_flow☆15Oct 15, 2019Updated 6 years ago
- ☆15Sep 28, 2023Updated 2 years ago
- ☆10Aug 1, 2021Updated 4 years ago
- [ICCV 2023] CTVIS: Consistent Training for Online Video Instance Segmentation☆81Oct 15, 2023Updated 2 years ago
- ☆29Feb 27, 2025Updated last year
- An implementation of MSSRM method☆11Mar 23, 2023Updated 3 years ago
- 📚 A curated collection of papers and open-source code repositories dedicated to the application of Vision-Language Models (VLMs) for str…☆114Updated this week
- ☆11Jan 18, 2024Updated 2 years ago
- Reinforcing Action Policies by Prophesying☆40Nov 26, 2025Updated 4 months ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- [CVPR 2023] Better “CMOS” Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution☆10Mar 19, 2024Updated 2 years ago
- Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection☆27Aug 22, 2024Updated last year
- We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ under…☆59Aug 18, 2025Updated 8 months ago
- [ICML'25 Spotlight] Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models☆51Jan 21, 2026Updated 2 months ago
- [MICCAI 2022] Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency☆12Nov 8, 2024Updated last year
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [ICRA 2026]☆188Mar 12, 2026Updated last month
- ☆18Apr 10, 2025Updated last year
- [ACL 2025] ⚖️ Temporally-aware MLLM for Biomedical Radiology Analysis and Report Generation. Flexible toolkit with MLLM backbone support,…☆29Mar 18, 2026Updated last month
- Official codebase for Fast-WAM: Do World Action Models Need Test-time Future Imagination?☆490Apr 3, 2026Updated 2 weeks ago
- Serverless GPU API endpoints on Runpod - Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- Panoramic Out-of-Distribution Segmentation☆15Dec 21, 2025Updated 3 months ago
- [AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning☆22Dec 2, 2025Updated 4 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆100Jul 15, 2024Updated last year
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- ☆42Jan 17, 2026Updated 3 months ago
- ☆12Mar 28, 2022Updated 4 years ago