sunshine-JLU / deepseek-janus-pro-lora
The objective of this project is to demonstrate how to fine-tune deepseek-janus-pro-lora.
☆22Updated 2 months ago
Alternatives and similar repositories for deepseek-janus-pro-lora:
Users that are interested in deepseek-janus-pro-lora are comparing it to the libraries listed below
- Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines☆123Updated 5 months ago
- Fine-tuning Qwen2.5-VL for vision-language tasks | Optimized for Vision understanding | LoRA & PEFT support.☆60Updated 2 months ago
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆89Updated 6 months ago
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆56Updated 6 months ago
- ☆73Updated 5 months ago
- Precision Search through Multi-Style Inputs☆68Updated this week
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 7 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆39Updated 7 months ago
- ☆33Updated 2 months ago
- The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆27Updated this week
- Encourage Medical LLM to engage in deep thinking similar to DeepSeek-R1.☆25Updated this week
- 训练一个对中文支持更好的LLaVA模型,并开源训练代码和数据。☆54Updated 7 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆90Updated 3 months ago
- Mamba-YOLO-World: Marrying YOLO-World with Mamba for Open-Vocabulary Detection☆82Updated last month
- 【ArXiv】PDF-Wukong: A Large Multimodal Model for Efficient Long PDF Reading with End-to-End Sparse Sampling☆116Updated 6 months ago
- ☆51Updated last week
- ☆23Updated 8 months ago
- [CVPR 2024 Highlight] Official GraCo: Granularity-Controllable Interactive Segmentation.☆53Updated last month
- YOLO-UniOW: Efficient Universal Open-World Object Detection☆115Updated 3 months ago
- [CVPR2025] Official implementation of High Fidelity Scene Text Synthesis.☆60Updated last month
- SAM4MLLM: Enhance Multi-Modal Large Language Model for Referring Expression Segmentation, 2024☆25Updated last month
- Workshop on Foundation Model 1st foundation model challenge Track1 codebase (Open TransMind v1.0)☆18Updated 2 years ago
- Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models☆60Updated 5 months ago
- The official repository for the RealSyn dataset☆21Updated 2 months ago
- ☆56Updated last year
- Official implementation of 🛸 "UFO: A Unified Approach to Fine-grained Visual Perception via Open-ended Language Interface"☆176Updated 3 weeks ago
- ☆51Updated this week
- Building a VLM model starts from the basic module.☆14Updated last year
- ☆83Updated 11 months ago
- A Token-level Text Image Foundation Model for Document Understanding☆91Updated last month