FeipengMa6 / VLoRAView external linksLinks
[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆55Mar 31, 2025Updated 10 months ago
Alternatives and similar repositories for VLoRA
Users that are interested in VLoRA are comparing it to the libraries listed below
Sorting:
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- ☆10Apr 7, 2025Updated 10 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆43Nov 26, 2024Updated last year
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆37Oct 9, 2025Updated 4 months ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆35Jul 15, 2025Updated 6 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆60Jun 6, 2025Updated 8 months ago
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆13May 3, 2024Updated last year
- ☆13Mar 28, 2025Updated 10 months ago
- A Massive Multi-Discipline Lecture Understanding Benchmark☆32Nov 1, 2025Updated 3 months ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆38Jan 26, 2026Updated 2 weeks ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- ☆15May 15, 2025Updated 8 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆34Jun 12, 2025Updated 8 months ago
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆37Oct 8, 2025Updated 4 months ago
- [AAAI2025] Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient☆44Apr 17, 2025Updated 9 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 8 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated last month
- Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".☆39Jan 4, 2026Updated last month
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆56Sep 12, 2025Updated 5 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆77Jul 13, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Oct 6, 2025Updated 4 months ago
- ☆34May 12, 2025Updated 9 months ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆144Jan 19, 2026Updated 3 weeks ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆100Jul 28, 2025Updated 6 months ago
- The WorldRWKV project aims to implement training and inference across various modalities using the RWKV7 architecture. By leveraging diff…☆66Dec 15, 2025Updated last month
- Youtu-VL: Unleashing Visual Potential via Unified Vision-Language Supervision☆134Feb 6, 2026Updated last week
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆49Jan 8, 2025Updated last year
- Open-Vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models (ICCV 20…☆18Apr 23, 2024Updated last year
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated 10 months ago
- [CVPR2024] Generative Region-Language Pretraining for Open-Ended Object Detection☆190Mar 29, 2025Updated 10 months ago
- Detectron2 Toolbox and Benchmark for V3Det☆18Jun 2, 2024Updated last year
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year
- ☆21Jan 17, 2025Updated last year