NeverMoreLCH / SearchLVLMs
Repository for the NeurIPS 2024 paper "SearchLVLMs: A Plug-and-Play Framework for Augmenting Large Vision-Language Models by Searching Up-to-Date Internet Knowledge"
☆19Updated 2 months ago
Alternatives and similar repositories for SearchLVLMs:
Users that are interested in SearchLVLMs are comparing it to the libraries listed below
- Repo for paper "T2Vid: Translating Long Text into Multi-Image is the Catalyst for Video-LLMs"☆49Updated last month
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆115Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆40Updated 8 months ago
- Official Repository of VideoLLaMB: Long Video Understanding with Recurrent Memory Bridges☆63Updated this week
- This is the official repo for ByteVideoLLM/Dynamic-VLM☆19Updated 2 months ago
- LAVIS - A One-stop Library for Language-Vision Intelligence☆48Updated 6 months ago
- [ICLR2025] MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆62Updated 5 months ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆33Updated 8 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆72Updated last month
- LMM solved catastrophic forgetting, AAAI2025☆38Updated 3 months ago
- Official implement of MIA-DPO☆49Updated last month
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 8 months ago
- Official PyTorch Implementation of MLLM Is a Strong Reranker: Advancing Multimodal Retrieval-augmented Generation via Knowledge-enhanced …☆59Updated 3 months ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆59Updated 4 months ago
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆41Updated this week
- Official repository of "Inst-IT: Boosting Multimodal Instance Understanding via Explicit Visual Prompt Instruction Tuning"☆27Updated last week
- ☆79Updated 9 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆74Updated 4 months ago
- ☆47Updated 2 months ago
- A light-weight and high-efficient training framework for accelerating diffusion tasks.☆46Updated 5 months ago
- ✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆93Updated 2 weeks ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated 5 months ago
- ☆36Updated 2 months ago
- [EMNLP 2023] TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding☆49Updated last year
- An Easy-to-use Hallucination Detection Framework for LLMs.☆57Updated 10 months ago
- ☆73Updated 11 months ago
- ☆65Updated last month