[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆56Mar 31, 2025Updated 11 months ago
Alternatives and similar repositories for VLoRA
Users that are interested in VLoRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆44Nov 26, 2024Updated last year
- COLA: Evaluate how well your vision-language model can Compose Objects Localized with Attributes!☆25Nov 23, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆43Mar 6, 2026Updated 3 weeks ago
- ☆20Sep 19, 2023Updated 2 years ago
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- ☆10Apr 7, 2025Updated 11 months ago
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 3 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- Recent Advances on MLLM's Reasoning Ability☆26Apr 11, 2025Updated 11 months ago
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆33Nov 1, 2025Updated 4 months ago
- ☆26Oct 7, 2024Updated last year
- ☆13Jun 5, 2024Updated last year
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆146Jan 19, 2026Updated 2 months ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- CatMAE☆14Dec 13, 2023Updated 2 years ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated 10 months ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆13May 3, 2024Updated last year
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆36Jul 15, 2025Updated 8 months ago
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆83Jun 13, 2025Updated 9 months ago
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆27Jan 4, 2026Updated 2 months ago
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆21Jul 2, 2024Updated last year
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆25Feb 2, 2025Updated last year
- [ECCV'24 Oral] PiTe: Pixel-Temporal Alignment for Large Video-Language Model☆17Feb 13, 2025Updated last year
- [ICCV 2025] HQ-CLIP: Leveraging Large Vision-Language Models to Create High-Quality Image-Text Datasets☆64Aug 6, 2025Updated 7 months ago
- Code for paper: Unified Text-to-Image Generation and Retrieval☆16Jul 6, 2024Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆177Oct 6, 2025Updated 5 months ago
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆38Oct 8, 2025Updated 5 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- ☆40Jul 14, 2025Updated 8 months ago
- Rethinking the Form of Latent States in Image Captioning☆20Aug 31, 2018Updated 7 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- ☆13Mar 28, 2025Updated 11 months ago
- [CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection☆185Oct 25, 2023Updated 2 years ago
- Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks☆3,958Updated this week
- A curated list of papers, datasets and resources pertaining to zero-shot object detection.☆29Mar 15, 2023Updated 3 years ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆77Jul 13, 2024Updated last year
- [CVPR2025] VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding☆24Mar 24, 2025Updated last year
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Oct 12, 2024Updated last year