KangsanKim07 / VideoICLView external linksLinks
[CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding
☆24Mar 24, 2025Updated 10 months ago
Alternatives and similar repositories for VideoICL
Users that are interested in VideoICL are comparing it to the libraries listed below
Sorting:
- ☆16May 12, 2025Updated 9 months ago
- ☆13Apr 25, 2025Updated 9 months ago
- ResearchAgent☆23Aug 24, 2025Updated 5 months ago
- [ICML 2025] Code for "R2-T2: Re-Routing in Test-Time for Multimodal Mixture-of-Experts"☆19Mar 10, 2025Updated 11 months ago
- ☆24May 13, 2025Updated 9 months ago
- [ICLR 2025] Large (Vision) Language Models are Unsupervised In-Context Learners☆22Jun 6, 2025Updated 8 months ago
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆42Jun 2, 2025Updated 8 months ago
- ☆23Sep 29, 2024Updated last year
- [COLM 2025] JailDAM: Jailbreak Detection with Adaptive Memory for Vision-Language Model☆25Nov 25, 2025Updated 2 months ago
- iLLaVA: An Image is Worth Fewer Than 1/3 Input Tokens in Large Multimodal Models☆21Jan 29, 2025Updated last year
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated 11 months ago
- [CVPR 2024] How to Configure Good In-Context Sequence for Visual Question Answering☆21May 28, 2025Updated 8 months ago
- Generated Faces in the Wild Dataset and Code☆18Mar 2, 2025Updated 11 months ago
- [EMNLP 2025 Findings] Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models☆139Aug 21, 2025Updated 5 months ago
- spatio-temporal tasks☆15Jul 15, 2024Updated last year
- [ICLR 2025] TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning☆90Apr 7, 2025Updated 10 months ago
- [ICML'25] Official code of paper "Fast Large Language Model Collaborative Decoding via Speculation"☆28Jun 23, 2025Updated 7 months ago
- A Text2SQL benchmark for evaluation of Large Language Models☆41Updated this week
- Domain Adaptation paper in various applications☆22Jan 20, 2020Updated 6 years ago
- VideoAuto-R1: Video Auto Reasoning via Thinking Once, Answering Twice☆61Jan 9, 2026Updated last month
- ☆28Feb 10, 2025Updated last year
- The first spoken long-text dataset derived from live streams, designed to reflect the redundancy-rich and conversational nature of real-w…☆13Jun 28, 2025Updated 7 months ago
- The official repository of "SmartAgent: Chain-of-User-Thought for Embodied Personalized Agent in Cyber World".☆27Aug 20, 2025Updated 5 months ago
- [EMNLP-2025 Oral] ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆72Nov 20, 2025Updated 2 months ago
- Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model☆81Nov 27, 2025Updated 2 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆43Apr 3, 2025Updated 10 months ago
- [ICCV'25] HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics☆38Sep 10, 2025Updated 5 months ago
- ☆18Jun 10, 2025Updated 8 months ago
- [NeurIPS ENLSP Workshop'24] CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios☆16Oct 18, 2024Updated last year
- [ICLR 2025] VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning☆70Sep 20, 2025Updated 4 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Feb 12, 2025Updated last year
- Symphony — A decentralized multi-agent framework that enables intelligent agents to collaborate seamlessly across heterogeneous edge devi…☆30Oct 30, 2025Updated 3 months ago
- [ICML 2024] Official Repository for the paper "Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models"☆10Jul 19, 2024Updated last year
- A Framework for Evaluating AI Agent Safety in Realistic Environments☆30Oct 2, 2025Updated 4 months ago
- Official Code Repository for the paper "KALA: Knowledge-Augmented Language Model Adaptation" (NAACL 2022)☆35Oct 17, 2023Updated 2 years ago
- [KDD 2026 ADS Track] Pytorch implementation of the paper "Hi-Guard: Towards Trustworthy Multimodal Moderation via Policy-Aligned Reasonin…☆19Jan 13, 2026Updated last month
- A CNN-BiLSTM model for Li-ion battery state of health and remaining useful life prediction☆11Mar 25, 2024Updated last year
- ☆11Jun 22, 2025Updated 7 months ago
- The official implement of paper 《DaMo: Data Mixing Optimizer in Fine-tuning Multimodal LLMs for Mobile Phone Agents》☆28Oct 23, 2025Updated 3 months ago