YanFangCS/GenLIP

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/YanFangCS/GenLIP)

YanFangCS / GenLIP

Official repo for "Let ViT Speak: Generative Language-Image Pre-training"

☆133

Alternatives and similar repositories for GenLIP

Users that are interested in GenLIP are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jiaosiyuu / ThinkGen
View on GitHub
ThinkGen: Generalized Thinking for Visual Generation
☆60Dec 30, 2025Updated 6 months ago
Xujxyang / OpenTrans
View on GitHub
☆26Apr 17, 2024Updated 2 years ago
loongfeili / Martian-World-Model
View on GitHub
[NeurIPS 2025] Official repo of "Martian World Model: Controllable Video Synthesis with Physically Accurate 3D Reconstructions"
☆20Aug 6, 2025Updated 11 months ago
linyiheng123 / MEMatte
View on GitHub
Memory Efficient Matting with Adaptive Token Routing (AAAI 2025)
☆73Mar 30, 2026Updated 3 months ago
THUMAI-Lab / LLaVA-UHD-v4
View on GitHub
☆46Jun 7, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
bytedance / UniVR
View on GitHub
☆17Updated this week
tencent-ailab / Penguin-VL
View on GitHub
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders [Technical Report]
☆204Mar 30, 2026Updated 3 months ago
hustvl / SuperCLIP
View on GitHub
☆140Dec 26, 2025Updated 6 months ago
facebookresearch / tuna-2
View on GitHub
Official implementation of Tuna-2: Pixel Embeddings Beat Vision Encoders for Unified Understanding and Generation
☆738Updated this week
google-deepmind / tips
View on GitHub
TIPSv2 (CVPR'26) and TIPS (ICLR'25)
☆572Jun 1, 2026Updated last month
TencentARC / MindOmni
View on GitHub
[NeurIPS2025] The official implementation of MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
☆139Oct 15, 2025Updated 9 months ago
KevinLight831 / P9D
View on GitHub
The download methods of Vision-language Continual Pretraining Dataset P9D.
☆12Jan 3, 2025Updated last year
UCSC-VLAA / OpenVision
View on GitHub
OpenVision (ICCV 2025), OpenVision 2 (CVPR 2026), and OpenVision 3
☆487Feb 21, 2026Updated 5 months ago
Supercomputing-System-AI-Lab / InstantEdit
View on GitHub
☆20Aug 15, 2025Updated 11 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
MiniMax-AI / VTP
View on GitHub
[ECCV 2026] Towards Scalable Pre-training of Visual Tokenizers for Generation
☆495Apr 15, 2026Updated 3 months ago
Becomebright / MTV
View on GitHub
Revisiting Multi-Task Visual Representation Learning
☆22Jan 21, 2026Updated 6 months ago
DripNowhy / Octopus
View on GitHub
[ICML 2026] Official implementation for paper: Learning Self-Correction in Vision–Language Models via Rollout Augmentation
☆16Jun 4, 2026Updated last month
EvolvingLMMs-Lab / LLaVA-OneVision-2
View on GitHub
Fully Open Framework for Democratized Multimodal Training
☆1,143Updated this week
kyegomez / open-moonvit
View on GitHub
This is an ultra-simple, single-file PyTorch implementation of MoonViT, the native-resolution vision encoder from Kimi-VL.
☆28Apr 25, 2026Updated 2 months ago
WeChatCV / ObjEmbed
View on GitHub
(ICML 2026) Official repository of paper "ObjEmbed: Towards Universal Multimodal Object Embeddings"
☆51May 18, 2026Updated 2 months ago
ZitengWangNYU / Scale-RAE
View on GitHub
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆255Feb 13, 2026Updated 5 months ago
LAION-AI / scaling-laws-for-comparison
View on GitHub
☆22May 12, 2026Updated 2 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
unica-visual-intelligence-lab / OmniRad
View on GitHub
☆16Feb 3, 2026Updated 5 months ago
zlab-princeton / vero
View on GitHub
Vero: An Open RL Recipe for General Visual Reasoning
☆134Jun 19, 2026Updated last month
ZoengHN / Embed-RL
View on GitHub
☆44Jun 23, 2026Updated 3 weeks ago
GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆74Dec 8, 2025Updated 7 months ago
Hope7Happiness / minit2i-torch
View on GitHub
Official PyTorch re-implementation of MiniT2I.
☆285Jun 24, 2026Updated 3 weeks ago
tliby / UniFork
View on GitHub
UniFork: Exploring Modality Alignment for Unified Multimodal Understanding and Generation
☆48Aug 26, 2025Updated 10 months ago
hyj542682306 / Semantic-Frame-Interpolation
View on GitHub
☆20Jul 8, 2025Updated last year
ATH-MaaS / Awesome-Unified-Multimodal-Models
View on GitHub
Awesome Unified Multimodal Models
☆1,300Mar 24, 2026Updated 3 months ago
ShareLab-SII / CoMP-MM
View on GitHub
Official repository of "CoMP: Continual Multimodal Pre-training for Vision Foundation Models"
☆48Apr 3, 2025Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
x-cls / superclass
View on GitHub
[NeurIPS 2024] Classification Done Right for Vision-Language Pre-Training
☆223Mar 20, 2025Updated last year
chuangchuangtan / C2P-CLIP-DeepfakeDetection
View on GitHub
C2P-CLIP-DeepfakeDetection
☆101Dec 26, 2025Updated 6 months ago
facebookresearch / metaquery
View on GitHub
Official Implementation of Paper Transfer between Modalities with MetaQueries
☆324Oct 12, 2025Updated 9 months ago
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 8 months ago
PKU-YuanGroup / UniWorld
View on GitHub
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation
☆883Dec 23, 2025Updated 6 months ago
jiaosiyu1999 / MAFT
View on GitHub
☆60Aug 12, 2024Updated last year
YihanHu-2022 / DiffMatte
View on GitHub
☆113Jul 4, 2024Updated 2 years ago