[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆56Mar 31, 2025Updated last year
Alternatives and similar repositories for VLoRA
Users that are interested in VLoRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆44Nov 26, 2024Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- ☆20Sep 19, 2023Updated 2 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆47Apr 28, 2026Updated last week
- EoFormer: Edge-oriented Transformer for Brain Tumor Segmentation☆26Jul 7, 2024Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".☆40Jan 4, 2026Updated 4 months ago
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- ☆10Apr 7, 2025Updated last year
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 4 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- Recent Advances on MLLM's Reasoning Ability☆26Apr 11, 2025Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated 11 months ago
- A Massive Multi-Discipline Lecture Understanding Benchmark☆34Apr 20, 2026Updated 2 weeks ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- ☆25Oct 7, 2024Updated last year
- ☆16May 15, 2025Updated 11 months ago
- ☆13Jun 5, 2024Updated last year
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆146Jan 19, 2026Updated 3 months ago
- CatMAE☆14Dec 13, 2023Updated 2 years ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆13May 3, 2024Updated 2 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated 11 months ago
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 9 months ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆27Apr 4, 2026Updated last month
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆22Jul 2, 2024Updated last year
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆176Oct 6, 2025Updated 7 months ago
- AML Command Transfer. A lightweight tool to transfer any command line to Azure Machine Learning Services☆20May 23, 2024Updated last year
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆39Oct 8, 2025Updated 6 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- [ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets☆65Aug 6, 2025Updated 9 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- ☆26Dec 26, 2024Updated last year
- ☆13Mar 28, 2025Updated last year
- ☆41Jul 14, 2025Updated 9 months ago
- [CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection☆186Oct 25, 2023Updated 2 years ago
- A curated list of papers, datasets and resources pertaining to zero-shot object detection.☆29Mar 15, 2023Updated 3 years ago