raghavlite / B3Links
☆27Updated 4 months ago
Alternatives and similar repositories for B3
Users that are interested in B3 are comparing it to the libraries listed below
Sorting:
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆68Updated 4 months ago
- A collection of visual instruction tuning datasets.☆76Updated last year
- ☆91Updated last year
- ☆133Updated last year
- ☆20Updated 3 weeks ago
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆57Updated last year
- [CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant☆164Updated 3 months ago
- ☆37Updated last year
- Turning to Video for Transcript Sorting☆48Updated 2 years ago
- Lion: Kindling Vision Intelligence within Large Language Models☆51Updated last year
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆58Updated 2 years ago
- Official repository of MMDU dataset☆95Updated last year
- ☆80Updated 10 months ago
- The official implementation of RAR☆92Updated last year
- Harnessing 1.4M GPT4V-synthesized Data for A Lite Vision-Language Model☆273Updated last year
- [ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption☆99Updated 2 years ago
- SVIT: Scaling up Visual Instruction Tuning☆163Updated last year
- Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original c…☆42Updated 10 months ago
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆39Updated last month
- Official repository for CoMM Dataset☆48Updated 9 months ago
- Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"☆32Updated 6 months ago
- LLaVA-NeXT-Image-Llama3-Lora, Modified from https://github.com/arielnlee/LLaVA-1.6-ft☆44Updated last year
- [NeurIPS 2024] Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning☆70Updated 8 months ago
- [ECCV 2024] Official PyTorch implementation of DreamLIP: Language-Image Pre-training with Long Captions☆136Updated 5 months ago
- [NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training☆216Updated 6 months ago
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆155Updated 10 months ago
- ☆65Updated last year
- 【NeurIPS 2024】Dense Connector for MLLMs☆177Updated 11 months ago
- Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types☆30Updated 2 months ago
- WeThink: Toward General-purpose Vision-Language Reasoning via Reinforcement Learning☆35Updated 4 months ago