[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆56Mar 31, 2025Updated last year
Alternatives and similar repositories for VLoRA
Users that are interested in VLoRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆47Nov 26, 2024Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25May 14, 2026Updated last week
- ☆20Sep 19, 2023Updated 2 years ago
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated 2 weeks ago
- EoFormer: Edge-oriented Transformer for Brain Tumor Segmentation☆26Jul 7, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".☆43Jan 4, 2026Updated 4 months ago
- ☆10Apr 7, 2025Updated last year
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 5 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆35Jun 30, 2025Updated 10 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- Recent Advances on MLLM's Reasoning Ability☆26Apr 11, 2025Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated 11 months ago
- A Massive Multi-Discipline Lecture Understanding Benchmark☆34Apr 20, 2026Updated last month
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆25Oct 7, 2024Updated last year
- ☆16May 15, 2025Updated last year
- ☆13Jun 5, 2024Updated last year
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆149Jan 19, 2026Updated 4 months ago
- Repo for NTK-Guided Few-Shot Class Incremental Learning (TIP2024)☆15Mar 8, 2026Updated 2 months ago
- CatMAE☆15Dec 13, 2023Updated 2 years ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆12May 19, 2026Updated last week
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated last year
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆84Jun 13, 2025Updated 11 months ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆37May 9, 2026Updated 2 weeks ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆30Apr 4, 2026Updated last month
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆22Jul 2, 2024Updated last year
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆177Oct 6, 2025Updated 7 months ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆39Oct 8, 2025Updated 7 months ago
- [ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets☆66Aug 6, 2025Updated 9 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- LLM4DistReconfig: A Fine-tuned Large Language Model for Power Distribution Network Reconfiguration☆27Mar 18, 2025Updated last year
- ☆13Mar 28, 2025Updated last year
- ☆42Jul 14, 2025Updated 10 months ago
- [CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection☆186Oct 25, 2023Updated 2 years ago