JiuTian-VL / LION-FSLinks
[CVPR 2025] LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
☆18Updated 3 weeks ago
Alternatives and similar repositories for LION-FS
Users that are interested in LION-FS are comparing it to the libraries listed below
Sorting:
- Unifying Specialized Visual Encoders for Video Language Models☆21Updated 3 weeks ago
- TemporalBench: Benchmarking Fine-grained Temporal Understanding for Multimodal Video Models☆33Updated 8 months ago
- Official repo for "Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge" ICLR2025☆59Updated 4 months ago
- Source code for the Paper "Mind the Gap: Benchmarking Spatial Reasoning in Vision-Language Models"☆12Updated last month
- Official Implementation of ISR-DPO:Aligning Large Multimodal Models for Videos by Iterative Self-Retrospective DPO (AAAI'25)☆20Updated 5 months ago
- [ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion☆48Updated 2 weeks ago
- [ECCV 2024] Learning Video Context as Interleaved Multimodal Sequences☆39Updated 4 months ago
- [CVPR 2025] Official PyTorch Implementation of GLUS: Global-Local Reasoning Unified into A Single Large Language Model for Video Segmenta…☆45Updated 3 weeks ago
- ☆35Updated 10 months ago
- Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning☆24Updated last week
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆41Updated 7 months ago
- Video-Panda: Parameter-efficient Alignment for Encoder-free Video-Language Models [CVPR 2025]☆72Updated 3 weeks ago
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated 4 months ago
- ACL'24 (Oral) Tuning Large Multimodal Models for Videos using Reinforcement Learning from AI Feedback☆67Updated 10 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆29Updated last week
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆23Updated 3 months ago
- ☆21Updated 3 months ago
- ☆37Updated last month
- Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆32Updated 2 months ago
- Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning?☆60Updated this week
- [ECCV 2024 Oral] Official implementation of the paper "DEVIAS: Learning Disentangled Video Representations of Action and Scene"☆20Updated 9 months ago
- [ICLR 2023] CoVLM: Composing Visual Entities and Relationships in Large Language Models Via Communicative Decoding☆45Updated last month
- Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)☆28Updated 11 months ago
- Official code for MotionBench (CVPR 2025)☆49Updated 4 months ago
- ☆58Updated last year
- Action Scene Graphs for Long-Form Understanding of Egocentric Videos (CVPR 2024)☆41Updated 3 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆85Updated 3 weeks ago
- ☆32Updated 3 months ago
- VPEval Codebase from Visual Programming for Text-to-Image Generation and Evaluation (NeurIPS 2023)☆45Updated last year
- ☆13Updated 7 months ago