[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆56Mar 31, 2025Updated 11 months ago
Alternatives and similar repositories for VLoRA
Users that are interested in VLoRA are comparing it to the libraries listed below
Sorting:
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- ☆10Apr 7, 2025Updated 11 months ago
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆43Nov 26, 2024Updated last year
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆35Jul 15, 2025Updated 7 months ago
- Recent Advances on MLLM's Reasoning Ability☆26Apr 11, 2025Updated 10 months ago
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆60Jun 6, 2025Updated 9 months ago
- ☆13Mar 28, 2025Updated 11 months ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆13May 3, 2024Updated last year
- [ICLR 2026] Official repo for "FrameThinker: Learning to Think with Long Videos via Multi-Turn Frame Spotlighting"☆38Oct 9, 2025Updated 4 months ago
- 【CVPRW'23】First Place Solution to the CVPR'2023 AQTC Challenge☆15Jul 18, 2023Updated 2 years ago
- A Massive Multi-Discipline Lecture Understanding Benchmark☆33Nov 1, 2025Updated 4 months ago
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- ☆15May 15, 2025Updated 9 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆35Jul 1, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆41Jan 26, 2026Updated last month
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"☆35Jun 12, 2025Updated 8 months ago
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆38Oct 8, 2025Updated 4 months ago
- [AAAI2025] Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient☆44Apr 17, 2025Updated 10 months ago
- TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…☆25Jun 4, 2025Updated 9 months ago
- CatMAE☆14Dec 13, 2023Updated 2 years ago
- [ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation☆57Sep 12, 2025Updated 5 months ago
- Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".☆39Jan 4, 2026Updated 2 months ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆77Jul 13, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Oct 6, 2025Updated 5 months ago
- ☆20Sep 19, 2023Updated 2 years ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆145Jan 19, 2026Updated last month
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated 9 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆101Jul 28, 2025Updated 7 months ago
- [ECCV 2024] OpenPSG: Open-set Panoptic Scene Graph Generation via Large Multimodal Models☆49Jan 8, 2025Updated last year
- ☆34May 12, 2025Updated 9 months ago
- Detectron2 Toolbox and Benchmark for V3Det☆18Jun 2, 2024Updated last year
- ☆21Jan 17, 2025Updated last year
- [NeurIPS25] Official Implementation (Pytorch) of "DeepVideo-R1"☆31Feb 22, 2026Updated last week
- [ECCV24] VISA: Reasoning Video Object Segmentation via Large Language Model☆19Jul 20, 2024Updated last year