EvolvingLMMs-Lab / LLaVA-OneVision-1.5-RLLinks

Fully Open Framework for Democratized Multimodal Reinforcement Learning.

☆40

Alternatives and similar repositories for LLaVA-OneVision-1.5-RL

Users that are interested in LLaVA-OneVision-1.5-RL are comparing it to the libraries listed below

Sorting:

deepglint / RealSyn
[ACM MM2025] The official repository for the RealSyn dataset
☆40Updated last month
TencentARC / ViSFT
☆37Updated 2 years ago
Qinying-Liu / TagAlign
Official implementation of TagAlign
☆35Updated last year
Kwai-YuanQi / TaskGalaxy
Scaling Multi-modal Instruction Fine-tuning with Tens of Thousands Vision Task Types
☆33Updated 6 months ago
mightyzau / InfMLLM
☆19Updated 2 years ago
lorebianchi98 / FG-CLIP
[CBMI 2024 Best Paper] Official repository of the paper "Is CLIP the main roadblock for fine-grained open-world perception?".
☆32Updated 9 months ago
OliverRensu / DeepMIM
[WACV2025 Oral] DeepMIM: Deep Supervision for Masked Image Modeling
☆56Updated 9 months ago
StanfordMIMI / villa
[ICCV 2023] ViLLA: Fine-grained vision-language representation learning from real-world data
☆46Updated 2 years ago
alibaba / conv-llava
☆124Updated last year
callsys / TextVR
[PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension
☆28Updated 2 years ago
zhjohnchan / SK-VG
[CVPR-2023] The official dataset of Advancing Visual Grounding with Scene Knowledge: Benchmark and Method.
☆33Updated 2 years ago
tripletclip / TripletCLIP
[NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"
☆46Updated last year
OliverRensu / D-iGPT
[ICML 2024] This repository includes the official implementation of our paper "Rejuvenating image-GPT as Strong Visual Representation Lea…
☆98Updated last year
hammoudhasan / SynthCLIP
Code base of SynthCLIP: CLIP training with purely synthetic text-image pairs from LLMs and TTIs.
☆102Updated 10 months ago
AdamRain / YFCC15M_downloader
A subset of YFCC100M. Tools, checking scripts and links of web drive to download datasets(uncompressed).
☆19Updated last year
weijiawu / Awesome-Synthetic-Data-for-Perception-Task
☆43Updated 2 years ago
iancovert / locality-alignment
☆54Updated last year
UCSC-VLAA / CLIPS
An Enhanced CLIP Framework for Learning with Synthetic Captions
☆39Updated 9 months ago
deepglint / ALIP
[ICCV 2023] ALIP: Adaptive Language-Image Pre-training with Synthetic Caption
☆104Updated 2 years ago
palchenli / VL-Instruction-Tuning
☆92Updated 2 years ago
yuecao0119 / MMFuser
The official implementation of the paper "MMFuser: Multimodal Multi-Layer Feature Fuser for Fine-Grained Vision-Language Understanding". …
☆62Updated last year
LiBingyu01 / FGA-seg
Fine-Grained Pixel-Text Alignment for Open-Vocabulary Semantic Segmentation
☆15Updated 4 months ago
Letian2003 / MM_INF
An efficient multi-modal instruction-following data synthesis tool and the official implementation of Oasis https://arxiv.org/abs/2503.08…
☆37Updated 8 months ago
opendatalab / MLLM-DataEngine
MLLM-DataEngine: An Iterative Refinement Approach for MLLM
☆48Updated last year
GaryGuTC / UniME-v2
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆60Updated 2 months ago
cv516Buaa / OV-VG
☆32Updated last year
jeykigung / HiCLIP
☆30Updated 2 years ago
OpenGVLab / Mono-InternVL
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆101Updated 6 months ago
opendatalab / CLIP-Parrot-Bias
ECCV2024_Parrot Captions Teach CLIP to Spot Text
☆66Updated last year
yoctta / XPaste
☆53Updated 2 years ago