XiaoduoAILab / XmodelVLM
☆65Updated 6 months ago
Alternatives and similar repositories for XmodelVLM:
Users that are interested in XmodelVLM are comparing it to the libraries listed below
- OLA-VLM: Elevating Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆45Updated last month
- [WACV 2025] Official implementation of "Online-LoRA: Task-free Online Continual Learning via Low Rank Adaptation" by Xiwen Wei, Guihong L…☆29Updated 2 months ago
- Code for Paper: Harnessing Webpage Uis For Text Rich Visual Understanding☆44Updated last month
- ☆57Updated 6 months ago
- Rethinking Step-by-step Visual Reasoning in LLMs☆151Updated this week
- A tool to assist in the interpretation of learned features in sparse autoencoders (in particular the four SAE's trained by Joseph Bloom o…☆17Updated 3 months ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆28Updated 2 months ago
- This is a public repository for Image Clustering Conditioned on Text Criteria (IC|TC)☆83Updated 9 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated last year
- Multi-vision Sensor Perception and Reasoning (MS-PR) benchmark, assessing VLMs on their capacity for sensor-specific reasoning.☆11Updated 2 weeks ago
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆41Updated 5 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆118Updated 4 months ago
- World's Smallest Vision-Language Model☆24Updated 9 months ago
- Official Pytorch Implementation of Self-emerging Token Labeling☆32Updated 9 months ago
- XmodelLM☆37Updated last month
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆27Updated 4 months ago
- ☆155Updated 3 months ago
- ☆60Updated 3 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆169Updated 3 weeks ago
- we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editi…☆29Updated 4 months ago
- (WACV 2025) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi, B…☆81Updated 4 months ago
- Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models☆72Updated 4 months ago
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆76Updated 8 months ago
- a family of highly capabale yet efficient large multimodal models☆176Updated 4 months ago
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆18Updated 2 months ago
- Data release for the ImageInWords (IIW) paper.☆205Updated 2 months ago
- ☆25Updated last year
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- Unofficial implementation and experiments related to Set-of-Mark (SoM) 👁️☆79Updated last year
- Evaluate the performance of computer vision models and prompts for zero-shot models (Grounding DINO, CLIP, BLIP, DINOv2, ImageBind, model…☆34Updated last year