OpenDocCN / python-code-anls
☆30Updated this week
Alternatives and similar repositories for python-code-anls:
Users that are interested in python-code-anls are comparing it to the libraries listed below
- Video dataset dedicated to portrait-mode video recognition.☆43Updated last month
- The official implementation of RAR☆79Updated 10 months ago
- The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆38Updated 4 months ago
- [ECCV 2024] SegVG: Transferring Object Bounding Box to Segmentation for Visual Grounding☆50Updated 3 months ago
- Multimodal Open-O1 (MO1) is designed to enhance the accuracy of inference models by utilizing a novel prompt-based approach. This tool wo…☆29Updated 4 months ago
- ☆79Updated 11 months ago
- Open-vocabulary Semantic Segmentation☆34Updated 11 months ago
- pytorch单精度、半精度、混合精度、单卡、多卡(DP / DDP)、FSDP、DeepSpeed模型训练代码,并对比不同方法的训练速度以及GPU内存的使用☆87Updated 10 months ago
- 【CVer出品】旨在盘点最全面的计算机视觉方向☆34Updated last year
- [CVPR'24] Multimodal Pathway: Improve Transformers with Irrelevant Data from Other Modalities☆98Updated 10 months ago
- 多模态 MM +Chat 合集☆238Updated 3 weeks ago
- ☆64Updated 2 months ago
- Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆85Updated 2 weeks ago
- ☆63Updated 3 months ago
- Precision Search through Multi-Style Inputs☆62Updated 6 months ago
- LMM which strictly superset LLM embedded☆37Updated 2 months ago
- [MM2024, oral] "Self-Supervised Visual Preference Alignment" https://arxiv.org/abs/2404.10501☆48Updated 6 months ago
- Building a VLM model starts from the basic module.☆11Updated 9 months ago
- GroundVLP: Harnessing Zero-shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection (AAAI 2024)☆63Updated last year
- [BMVC 2024] PlainMamba: Improving Non-hierarchical Mamba in Visual Recognition☆72Updated 5 months ago
- DeepSpeed Tutorial☆94Updated 5 months ago
- [IEEE TCSVT] Official Pytorch Implementation of CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation.☆37Updated 3 weeks ago
- ☆12Updated 2 months ago
- [ICLR 2025] LLaVA-MoD: Making LLaVA Tiny via MoE-Knowledge Distillation☆81Updated last week
- ☆72Updated last year
- ☆27Updated last month
- Train InternViT-6B in MMSegmentation and MMDetection with DeepSpeed☆75Updated 3 months ago
- 这是一个DiT-pytorch的代码,主要用于学习DiT结构。☆70Updated 11 months ago
- UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model☆20Updated 5 months ago
- ☆29Updated 10 months ago