Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines
☆130Nov 6, 2024Updated last year
Alternatives and similar repositories for VSA
Users that are interested in VSA are comparing it to the libraries listed below
Sorting:
- Parsing-free RAG supported by VLMs☆917Dec 7, 2025Updated 2 months ago
- Repo for Benchmarking Multimodal Retrieval Augmented Generation with Dynamic VQA Dataset and Self-adaptive Planning Agent☆414Apr 22, 2025Updated 10 months ago
- ☆98Jun 23, 2025Updated 8 months ago
- Large Multimodal Model☆15Apr 8, 2024Updated last year
- ☆18Jun 10, 2025Updated 8 months ago
- [NAACL 2025] Representing Rule-based Chatbots with Transformers☆23Feb 9, 2025Updated last year
- Official code implementation of the paper: QuCo-RAG: Quantifying Uncertainty from the Pre-training Corpus for Dynamic Retrieval-Augmente…☆38Jan 10, 2026Updated last month
- Code and data of "Controllable Unsupervised Event-based Video Generation" (accepted as ICIP oral and invited by WACV workshop)☆19Nov 5, 2024Updated last year
- Introduce a novel Video Trimming (VT) task and proposes an agent-based approach (AVT) for detecting wasted footage, selecting valuable se…☆23Jan 20, 2025Updated last year
- [ICCV 2025] Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆83Feb 27, 2025Updated last year
- [NeurIPS VLM workshop 2024] In-Context Ensemble Learning from Pseudo Labels Improves Video-Language Models for Low-Level Workflow Underst…☆23Mar 16, 2025Updated 11 months ago
- NeurIPS 2024 Paper: A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing☆579Oct 20, 2024Updated last year
- A novel Multimodal Large Language Model (MLLM) architecture, designed to structurally align visual and textual embeddings.☆1,432Feb 11, 2026Updated 2 weeks ago
- [ICCV 2025] Explore the Limits of Omni-modal Pretraining at Scale☆123Sep 2, 2024Updated last year
- ☆91Feb 23, 2026Updated last week
- Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"☆86Mar 21, 2024Updated last year
- Kimi-VL: Mixture-of-Experts Vision-Language Model for Multimodal Reasoning, Long-Context Understanding, and Strong Agent Capabilities☆1,164Jul 15, 2025Updated 7 months ago
- ☆13Feb 2, 2025Updated last year
- Normalization Matters in Weakly Supervised Object Localization (ICCV 2021)☆11Oct 24, 2021Updated 4 years ago
- Southeast University Knowledge Graph-OpenRichpedia☆41Aug 28, 2021Updated 4 years ago
- 🔥 [NeurIPS 2025] Official implementation of "Generate, but Verify: Reducing Visual Hallucination in Vision-Language Models with Retrospe…☆52Jan 22, 2026Updated last month
- ☆28Apr 8, 2025Updated 10 months ago
- [NeurIPS 2024] TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration☆26Oct 17, 2024Updated last year
- ☆34Oct 9, 2025Updated 4 months ago
- [NeurIPS 2025] Official code implementation of Perception R1: Pioneering Perception Policy with Reinforcement Learning☆286Jul 15, 2025Updated 7 months ago
- A digital twin of the city of Chicago along with automated sensors☆12Nov 14, 2019Updated 6 years ago
- EgoToM is an egocentric theory-of-mind benchmark built on Ego4D videos, containing multi-choice questions that evaluate multimodal large …☆13Apr 1, 2025Updated 11 months ago
- ☆96Dec 6, 2024Updated last year
- [NeurIPS 2024] RaVL: Discovering and Mitigating Spurious Correlations in Fine-Tuned Vision-Language Models☆31Nov 12, 2024Updated last year
- [ICCV 2025 Highlight] The official repository for "2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining"☆194Mar 17, 2025Updated 11 months ago
- ☆190Feb 5, 2026Updated 3 weeks ago
- 最简易的R1结果在小模型上的复现,阐述类O1与DeepSeek R1最重要的本质。Think is all your need。利用实验佐证,对于强推理能力,think思考过程性内容是AGI/ASI的核心。☆45Feb 8, 2025Updated last year
- Awesome Self-Supervised Vision Learning☆11Mar 27, 2024Updated last year
- ☆13Apr 23, 2025Updated 10 months ago
- The official repo for [ACM CSUR'24] "Empowering Agrifood System with Artificial Intelligence: A Survey of the Progress, Challenges and Op…☆12Dec 6, 2024Updated last year
- This is an implementation of the paper "Are We Done with Object-Centric Learning?"☆12Sep 11, 2025Updated 5 months ago
- ☆14Nov 23, 2023Updated 2 years ago
- ☆13Jul 10, 2024Updated last year
- Codes for our paper "AgentMonitor: A Plug-and-Play Framework for Predictive and Secure Multi-Agent Systems"☆13Dec 13, 2024Updated last year