XiaoduoAILab / XmodelVLM
☆58Updated 4 months ago
Related projects ⓘ
Alternatives and complementary repositories for XmodelVLM
- E5-V: Universal Embeddings with Multimodal Large Language Models☆167Updated 3 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆37Updated 3 weeks ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆34Updated last year
- InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)☆137Updated 5 months ago
- Code for this paper "HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts via HyperNetwork"☆31Updated 11 months ago
- ☆86Updated 9 months ago
- ☆55Updated 3 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆29Updated last week
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆160Updated last month
- Python Library to evaluate VLM models' robustness across diverse benchmarks☆168Updated last week
- ClickDiffusion: Harnessing LLMs for Interactive Precise Image Editing☆65Updated 5 months ago
- ☆57Updated last month
- ☆62Updated last month
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 4 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 2 weeks ago
- This is the repo for the paper "PANGEA: A FULLY OPEN MULTILINGUAL MULTIMODAL LLM FOR 39 LANGUAGES"☆88Updated this week
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated last month
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆79Updated 7 months ago
- Video-LlaVA fine-tune for CinePile evaluation☆38Updated 3 months ago
- ☆103Updated 2 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆69Updated last month
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆76Updated 6 months ago
- ☆25Updated 2 months ago
- ☆36Updated 3 months ago
- a family of highly capabale yet efficient large multimodal models☆161Updated 2 months ago
- This repository contains the code for the paper: SirLLM: Streaming Infinite Retentive LLM☆55Updated 5 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆51Updated 3 weeks ago
- ☆67Updated 3 weeks ago