raghavlite / B3Links

☆30

Alternatives and similar repositories for B3

Users that are interested in B3 are comparing it to the libraries listed below

Sorting:

XMUDeepLIT / LLaVE
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆72Updated 6 months ago
BAAI-DCAI / DataOptim
A collection of visual instruction tuning datasets.
☆76Updated last year
mynameischaos / Lion
Lion: Kindling Vision Intelligence within Large Language Models
☆51Updated last year
X2FD / LVIS-INSTRUCT4V
☆133Updated last year
palchenli / VL-Instruction-Tuning
☆91Updated 2 years ago
JUNJIE99 / VISTA_Evaluation_FineTuning
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…
☆44Updated last year
zhourax / VEGA
☆37Updated last year
foundation-multimodal-models / CAL
[NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment
☆57Updated last year
RUCAIBox / ComVint
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Updated 2 years ago
Liuziyu77 / MMDU
Official repository of MMDU dataset
☆98Updated last year
deepglint / ALIP
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆102Updated 2 years ago
SliMM-X / CoMP-MM
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆32Updated 7 months ago
yangjie-cv / WeThink
WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning
☆35Updated 5 months ago
SY-Xuan / Pink
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆95Updated 10 months ago
TencentARC / GVT
Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".
☆58Updated 2 years ago
foundation-multimodal-models / CAPTURE
☆80Updated last year
deepglint / UniME
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆96Updated 2 weeks ago
baaivision / DenseFusion
DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception
☆158Updated 11 months ago
Liuziyu77 / RAR
The official implementation of RAR
☆92Updated last year
QQ-MM / QQMM-embed
☆21Updated last month
DCDmllm / Momentor
☆80Updated last year
BAAI-DCAI / Visual-Instruction-Tuning
SVIT: Scaling up Visual Instruction Tuning
☆164Updated last year
ant-research / DreamLIP
[ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions
☆136Updated 6 months ago
HJYao00 / DenseConnector
【NeurIPS 2024】Dense Connector for MLLMs
☆180Updated last year
Code-kunkun / LamRA
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆172Updated 4 months ago
Share14 / ShareGemini
☆32Updated last year
bzluan / TextCoT
The official repo for “TextCoT: Zoom In for Enhanced Multimodal Text-Rich Image Understanding”.
☆44Updated last year
BUAADreamer / SPN4CIR
[ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives
☆39Updated 2 months ago
scenarios / WeMM
☆87Updated last year
Kwai-YuanQi / TaskGalaxy
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
☆32Updated 4 months ago