XiaoduoAILab / XmodelVLM
☆59Updated 5 months ago
Related projects ⓘ
Alternatives and complementary repositories for XmodelVLM
- E5-V: Universal Embeddings with Multimodal Large Language Models☆173Updated 4 months ago
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆146Updated 5 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆30Updated 3 weeks ago
- ✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models☆137Updated last week
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆161Updated last month
- Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"☆142Updated last week
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆38Updated last month
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆17Updated 3 weeks ago
- ☆87Updated 10 months ago
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆65Updated 6 months ago
- LLaVA-HR: High-Resolution Large Language-Vision Assistant☆212Updated 3 months ago
- Matryoshka Multimodal Models☆82Updated this week
- [EMNLP 2024] Official PyTorch implementation code for realizing the technical part of Traversal of Layers (TroL) presenting new propagati…☆88Updated 4 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆80Updated 8 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- ☆55Updated 4 months ago
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆179Updated last month
- Codes for Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models☆124Updated 3 weeks ago
- ☆146Updated last month
- a family of highly capabale yet efficient large multimodal models☆166Updated 2 months ago
- [NeurIPS 2024] MoVA: Adapting Mixture of Vision Experts to Multimodal Context☆132Updated last month
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 2 months ago
- we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editi…☆24Updated 2 months ago
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated 3 weeks ago
- ☆23Updated last week
- Parameter-efficient finetuning script for Phi-3-vision, the strong multimodal language model by Microsoft.☆54Updated 5 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆70Updated 2 months ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆91Updated 2 weeks ago
- [NeurIPS 2024] Official PyTorch implementation code for realizing the technical part of Mamba-based traversal of rationale (Meteor) to im…☆102Updated 5 months ago