raghavlite / B3Links
☆30Updated last week
Alternatives and similar repositories for B3
Users that are interested in B3 are comparing it to the libraries listed below
Sorting:
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆70Updated 5 months ago
 - ☆21Updated 2 weeks ago
 - Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆43Updated 11 months ago
 - ☆37Updated last year
 - WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆35Updated 4 months ago
 - ☆91Updated last year
 - A collection of visual instruction tuning datasets.☆76Updated last year
 - ☆133Updated last year
 - Lion: Kindling Vision Intelligence within Large Language Models☆51Updated last year
 - The official implementation of RAR☆92Updated last year
 - Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆32Updated 7 months ago
 - Official repository of MMDU dataset☆96Updated last year
 - [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆171Updated 3 months ago
 - Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs☆94Updated 9 months ago
 - Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated 2 years ago
 - ☆66Updated last year
 - [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆70Updated 8 months ago
 - [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated last year
 - Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆275Updated last year
 - ☆155Updated last year
 - The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.☆43Updated last year
 - ☆31Updated last year
 - A huge dataset for Document Visual Question Answering☆20Updated last year
 - The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…☆19Updated last year
 - official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆36Updated 3 months ago
 - [ACM MM25] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"☆94Updated 2 months ago
 - 【NeurIPS 2024】Dense Connector for MLLMs☆179Updated last year
 - Offical PyTorch implementation of Clover: Towards A Unified Video-Language Alignment and Fusion Model (CVPR2023)☆40Updated 2 years ago
 - Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆31Updated 3 months ago
 - [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆101Updated 2 years ago