XMUDeepLIT/LLaVE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/XMUDeepLIT/LLaVE)

XMUDeepLIT / LLaVE

LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning

☆78

Alternatives and similar repositories for LLaVE

Users that are interested in LLaVE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

haon-chen / mmE5
View on GitHub
☆59Feb 27, 2025Updated last year
TIGER-AI-Lab / ABC
View on GitHub
ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]
☆19Aug 21, 2025Updated 10 months ago
TIGER-AI-Lab / VLM2Vec
View on GitHub
This repo contains the code for "VLM2Vec / MMEB" [ICLR 2025], "VLM2Vec-V2 / MMEB-V2" [TMLR 2026], and "MMEB-V3" [COLM 2026]
☆667Jun 24, 2026Updated 3 weeks ago
Code-kunkun / LamRA
View on GitHub
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆182Jul 7, 2025Updated last year
friedrichor / UNITE
View on GitHub
official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"
☆42Jul 4, 2025Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
haon-chen / MoCa
View on GitHub
☆68Aug 14, 2025Updated 11 months ago
raghavlite / B3
View on GitHub
☆43Jan 12, 2026Updated 6 months ago
Tencent-QQMM / QQMM-embed
View on GitHub
☆25Jun 22, 2026Updated 3 weeks ago
haoyu-bu / CAFe
View on GitHub
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆33Mar 26, 2025Updated last year
XMUDeepLIT / UME-R1
View on GitHub
The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).
☆69Feb 25, 2026Updated 4 months ago
deepglint / UniME
View on GitHub
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆105Dec 8, 2025Updated 7 months ago
chs20 / fuselip
View on GitHub
FuseLIP: Multimodal Embeddings via Early Fusion of Discrete Tokens
☆17Sep 8, 2025Updated 10 months ago
wmt-conference / wmt23-news-systems
View on GitHub
☆14Oct 6, 2025Updated 9 months ago
kongds / E5-V
View on GitHub
E5-V: Universal Embeddings with Multimodal Large Language Models
☆275Dec 10, 2025Updated 7 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
deepglint / Victor
View on GitHub
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
☆29Aug 15, 2025Updated 11 months ago
Adaxry / Unified_Layer_Skipping
View on GitHub
☆15Apr 11, 2024Updated 2 years ago
GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆74Dec 8, 2025Updated 7 months ago
LightDXY / MaskCLIP
View on GitHub
☆37Apr 13, 2023Updated 3 years ago
ZoengHN / Embed-RL
View on GitHub
☆44Jun 23, 2026Updated 3 weeks ago
chaxjli / U-MARVEL
View on GitHub
☆36Mar 24, 2026Updated 3 months ago
wmt-conference / wmt22-news-systems
View on GitHub
☆21Feb 13, 2023Updated 3 years ago
CnFaker / LLaVA-SP
View on GitHub
[ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".
☆24Oct 28, 2025Updated 8 months ago
OpenGVLab / TPO
View on GitHub
Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment
☆65Jul 22, 2025Updated 11 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
TIGER-AI-Lab / UniIR
View on GitHub
Official code for paper "UniIR: Training and Benchmarking Universal Multimodal Information Retrievers" (ECCV 2024)
☆183Oct 1, 2024Updated last year
microsoft / A-CLIP
View on GitHub
Official Implementation of Attentive Mask CLIP (ICCV2023, https://arxiv.org/abs/2212.08653)
☆37May 29, 2024Updated 2 years ago
xydaytoy / BMI-NMT
View on GitHub
☆11Jul 28, 2021Updated 4 years ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
chancharikmitra / SAVs
View on GitHub
Official Codebase for "Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers"
☆26Jun 7, 2025Updated last year
deepglint / DanQing
View on GitHub
The official repo for the DanQing dataset.
☆36Mar 25, 2026Updated 3 months ago
MCG-NJU / RGE
View on GitHub
Reasoning Guided Embeddings: Leveraging MLLM Reasoning for Improved Multimodal Retrieval
☆15Nov 29, 2025Updated 7 months ago
Victorwz / Open-Qwen2VL
View on GitHub
[COLM 2025] Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources
☆314Aug 25, 2025Updated 10 months ago
xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
yshinya6 / clip-refine
View on GitHub
Code repository for "Post-pre-training for Modality Alignment in Vision-Language Foundation Models" (CVPR2025)
☆41Jul 25, 2025Updated 11 months ago
pppa2019 / swie_overmiss_llm4mt
View on GitHub
Code for "Improving Translation Faithfulness of Large Language Models via Augmenting Instructions"
☆12Aug 26, 2023Updated 2 years ago
tulip-berkeley / open_clip
View on GitHub
An open source implementation of CLIP (With TULIP Support)
☆165May 14, 2025Updated last year
360CVGroup / RzenEmbed
View on GitHub
Embedding model prioritized towards Multimodal RAG, overall + VisDoc double top1 on MMEB benchmark
☆36Jun 16, 2026Updated last month
pengfei-luo / ImageScope
View on GitHub
[WWW 2025 Oral] ImageScope: Unifying Language-Guided Image Retrieval via Large Multimodal Model Collective Reasoning
☆21Jul 2, 2025Updated last year
Jingfeng0705 / LIFT
View on GitHub
The official repo for LIFT: Language-Image Alignment with Fixed Text Encoders
☆43Jun 10, 2025Updated last year
BIGBALLON / BeyondCLIP
View on GitHub
Not a neutral survey — a field manual for engineers who build, train, and ship multimodal retrieval at production scale. The C-L-I triang…
☆79Apr 20, 2026Updated 3 months ago