[ICCV 2025] Implementation of the paper "Q-Frame: Query-aware Frame Selection and Multi-Resolution Adaptation for Video-LLMs"
☆71Oct 25, 2025Updated 5 months ago
Alternatives and similar repositories for q-frame
Users that are interested in q-frame are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [SIGCOMM 2023] PacketGame: Multi-Stream Packet Gating for Concurrent Video Inference at Scale☆14Jul 1, 2023Updated 2 years ago
- [CVPR2026] VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding☆96Mar 17, 2026Updated last week
- Official implementation of "TailorKV: A Hybrid Framework for Long-Context Inference via Tailored KV Cache Optimization" (Findings of ACL …☆21Jul 25, 2025Updated 8 months ago
- [ICCV 2025] GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding☆75Jun 26, 2025Updated 9 months ago
- JoVA: Unified Multimodal Learning for Joint Video-Audio Generation☆30Dec 22, 2025Updated 3 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- [TCSVT 2024] Implementation of the paper "SiT-MLP: A Simple MLP with Point-wise Topology Feature Learning for Skeleton-based Action Recog…☆19Apr 10, 2024Updated last year
- ☆31Nov 1, 2023Updated 2 years ago
- SurgLaVi: Official repository☆29Mar 4, 2026Updated 3 weeks ago
- Re-implementation of SLAM-ASR paper's experiment, using Phi-2 and Hubert☆21Jun 14, 2024Updated last year
- Official implementation of CMMCoT: Enhancing Complex Multi-Image Comprehension via Multi-Modal Chain-of-Thought and Memory Augmentation☆12Dec 5, 2025Updated 3 months ago
- [ICCV 2025] Object-centric Video Question Answering with Visual Grounding and Referring☆25Aug 8, 2025Updated 7 months ago
- Technical Challenge Repository for Visual Anomaly Detection Workshop (VAND) at CVPR☆13Jul 21, 2025Updated 8 months ago
- Inception-I3D, Non Local finetune, hmdb51_flow☆15Oct 15, 2019Updated 6 years ago
- OmniMamba: Efficient and Unified Multimodal Understanding and Generation via State Space Models☆143Apr 25, 2025Updated 11 months ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- We introduce CausalVQA, a benchmark dataset for video question answering (VQA) composed of question-answer pairs that probe models’ under…☆55Aug 18, 2025Updated 7 months ago
- ☆29Feb 27, 2025Updated last year
- ☆11Jan 18, 2024Updated 2 years ago
- Reinforcing Action Policies by Prophesying☆40Nov 26, 2025Updated 4 months ago
- Full-frequency dynamic convolution: a physical frequency-dependent convolution for sound event detection☆27Aug 22, 2024Updated last year
- [CVPR 2023] Better “CMOS” Produces Clearer Images: Learning Space-Variant Blur Estimation for Blind Image Super-Resolution☆10Mar 19, 2024Updated 2 years ago
- [ICML'25 Spotlight] Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models☆48Jan 21, 2026Updated 2 months ago
- [MICCAI 2022] Toward Clinically Assisted Colorectal Polyp Recognition via Structured Cross-modal Representation Consistency☆12Nov 8, 2024Updated last year
- LLaVA-VLA: A Simple Yet Powerful Vision-Language-Action Model [ICRA 2026]☆185Mar 12, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Low-Latency Live Video Streaming over a Low-Earth-Orbit Satellite Network with DASH☆18Sep 6, 2024Updated last year
- Open-source audio embedding models, submitted to the HEAR 2021 challenge☆11Feb 15, 2026Updated last month
- Transferring Genshin PVs into a freehand style with Diffusion Model.☆10Jun 5, 2024Updated last year
- The official repo of the paper titled DeH4R: A Decoupled and Hybrid Method for Road Network Graph Extraction.☆22Dec 1, 2025Updated 3 months ago
- ☆27Aug 2, 2023Updated 2 years ago
- Multi-step reasoning MLLM☆16Mar 8, 2026Updated 2 weeks ago
- [ACL 2025] ⚖️ Temporally-aware MLLM for Biomedical Radiology Analysis and Report Generation. Flexible toolkit with MLLM backbone support,…☆28Mar 18, 2026Updated last week
- Panoramic Out-of-Distribution Segmentation☆15Dec 21, 2025Updated 3 months ago
- Prototyp MegaScale-Infer: Serving Mixture-of-Experts at Scale with Disaggregated Expert Parallelism☆27Apr 4, 2025Updated 11 months ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆18Oct 22, 2024Updated last year
- [AAAI 2026] SIFThinker: Spatially-Aware Image Focus for Visual Reasoning☆23Dec 2, 2025Updated 3 months ago
- This repo contains the code for our paper Towards Open-Ended Visual Recognition with Large Language Model☆100Jul 15, 2024Updated last year
- [ECCV2024] PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects☆58Sep 17, 2024Updated last year
- A simple command line tool to calculate WER for ASR.☆14Oct 14, 2024Updated last year
- Code For Our Work: DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries [ECCV-2024]☆14Jul 11, 2024Updated last year
- FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition☆33Nov 29, 2024Updated last year