[NeurIPS 2024] Visual Perception by Large Language Model’s Weights
☆55Mar 31, 2025Updated last year
Alternatives and similar repositories for VLoRA
Users that are interested in VLoRA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- [NeurIPS2023] Exploring Diverse In-Context Configurations for Image Captioning☆47Nov 26, 2024Updated last year
- WorldSense: Evaluating Real-world Omnimodal Understanding for Multimodal LLMs☆49May 7, 2026Updated last month
- EoFormer: Edge-oriented Transformer for Brain Tumor Segmentation☆27Jul 7, 2024Updated last year
- Open source implementation of the paper "MM-Vid: Advancing Video Understanding with GPT-4V(ision)".☆44Jan 4, 2026Updated 5 months ago
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- ☆10Apr 7, 2025Updated last year
- CrossLMM: Decoupling Long Video Sequences from LMMs via Dual Cross-Attention Mechanisms☆25Dec 21, 2025Updated 5 months ago
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆35Jun 30, 2025Updated 11 months ago
- LLMBind: A Unified Modality-Task Integration Framework☆19Jun 16, 2024Updated last year
- Recent Advances on MLLM's Reasoning Ability☆26Apr 11, 2025Updated last year
- Look, Compare, Decide: Alleviating Hallucination in Large Vision-Language Models via Multi-View Multi-Path Reasoning☆24Sep 9, 2024Updated last year
- Reinforcement Learning Tuning for VideoLLMs: Reward Design and Data Efficiency☆62Jun 6, 2025Updated last year
- A Massive Multi-Discipline Lecture Understanding Benchmark☆34Apr 20, 2026Updated last month
- ☆25Oct 7, 2024Updated last year
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆16May 15, 2025Updated last year
- ☆33Nov 18, 2025Updated 6 months ago
- ☆13Jun 5, 2024Updated 2 years ago
- [CVPR2025] Number it: Temporal Grounding Videos like Flipping Manga☆149Jan 19, 2026Updated 4 months ago
- Repo for NTK-Guided Few-Shot Class Incremental Learning (TIP2024)☆15Mar 8, 2026Updated 3 months ago
- CatMAE☆15Dec 13, 2023Updated 2 years ago
- [NAACL 2024] Z-GMOT: Zero-shot Generic Multiple Object Tracking☆12May 19, 2026Updated 3 weeks ago
- [ACL2025 Findings] Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models☆89May 20, 2025Updated last year
- 「AAAI 2024」 Referred by Multi-Modality: A Unified Temporal Transformers for Video Object Segmentation☆85Jun 13, 2025Updated last year
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning☆37May 9, 2026Updated last month
- Various test models in WNNX format. It can view with `pip install wnetron && wnetron`☆12Jun 22, 2022Updated 3 years ago
- [AAAI2025] Video Repurposing from User Generated Content: A Large-scale Dataset and Benchmark☆30Apr 4, 2026Updated 2 months ago
- 「ECCV 2024」 PanoVOS: Bridging Non-panoramic and Panoramic Views with Transformer for Video Segmentation☆22Jul 2, 2024Updated last year
- Code for Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense? [COLM 2024]☆24Aug 13, 2024Updated last year
- Code implementation of paper "MUSE: Mamba is Efficient Multi-scale Learner for Text-video Retrieval (AAAI2025)"☆26Feb 2, 2025Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆177Oct 6, 2025Updated 8 months ago
- Official implementation of "Reasoning by Superposition: A Theoretical Perspective on Chain of Continuous Thought" (NeurIPS 2025)☆40Oct 8, 2025Updated 8 months ago
- The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]☆21Feb 27, 2025Updated last year
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆25Dec 26, 2024Updated last year
- Rethinking the Form of Latent States in Image Captioning☆20Aug 31, 2018Updated 7 years ago
- ☆13Mar 28, 2025Updated last year
- ☆43Jul 14, 2025Updated 11 months ago
- [CVPR2023] Code Release of Aligning Bag of Regions for Open-Vocabulary Object Detection☆186Oct 25, 2023Updated 2 years ago
- A curated list of papers, datasets and resources pertaining to zero-shot object detection.☆29Mar 15, 2023Updated 3 years ago
- VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models☆79Jul 13, 2024Updated last year