haoyu-bu/CAFe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/haoyu-bu/CAFe)

haoyu-bu / CAFe

Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"

☆33

Alternatives and similar repositories for CAFe

Users that are interested in CAFe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

GaryGuTC / UniME-v2
View on GitHub
[AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"
☆72Dec 8, 2025Updated 5 months ago
Code-kunkun / LamRA
View on GitHub
[CVPR 2025] LamRA: Large Multimodal Model as Your Advanced Retrieval Assistant
☆181Jul 7, 2025Updated 10 months ago
longmalongma / TW-GRPO
View on GitHub
The official repository of our paper "Reinforcing Video Reasoning with Focused Thinking"
☆35Jun 12, 2025Updated 11 months ago
chaxjli / U-MARVEL
View on GitHub
☆36Mar 24, 2026Updated 2 months ago
raghavlite / B3
View on GitHub
☆40Jan 12, 2026Updated 4 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
XMUDeepLIT / LLaVE
View on GitHub
LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning
☆77May 23, 2025Updated last year
PKU-YuanGroup / LLMBind
View on GitHub
LLMBind: A Unified Modality-Task Integration Framework
☆19Jun 16, 2024Updated last year
kongds / E5-V
View on GitHub
E5-V: Universal Embeddings with Multimodal Large Language Models
☆275Dec 10, 2025Updated 5 months ago
ExplainableML / cosmos
View on GitHub
[CVPR 2025] COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
☆41Mar 27, 2025Updated last year
saccharomycetes / visual_crop_zsvqa
View on GitHub
☆12Apr 10, 2024Updated 2 years ago
WangFei-2019 / SNARE
View on GitHub
Project for SNARE benchmark
☆11Jun 5, 2024Updated last year
PKU-YuanGroup / UniSandBox
View on GitHub
Does Understanding Inform Generation in Unified Multimodal Models? From Analysis to Path Forward
☆60Nov 27, 2025Updated 6 months ago
PKU-YuanGroup / PiCO
View on GitHub
[ICLR'25] PiCO: Peer Review in LLMs based on the Consistency Optimization, https://arxiv.org/pdf/2402.01830
☆36Feb 16, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
sssaury / HAM
View on GitHub
Code for Modeling Thousands of Human Annotators for Generalizable Text-to-Image Person Re-identification (CVPR2025)
☆47Nov 4, 2025Updated 6 months ago
taewhankim / VIPCAP
View on GitHub
☆14Dec 31, 2024Updated last year
google-deepmind / geckonum_benchmark_t2i
View on GitHub
GeckoNum Benchmark for T2I Model Eval.
☆15Dec 5, 2024Updated last year
TIGER-AI-Lab / VLM2Vec
View on GitHub
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR 2025]
☆647Updated this week
archiki / RepARe
View on GitHub
☆21Oct 10, 2023Updated 2 years ago
princeton-pli / VLM_S2H
View on GitHub
Generalizing from SIMPLE to HARD Visual Reasoning: Can We Mitigate Modality Imbalance in VLMs?
☆19Jun 3, 2025Updated 11 months ago
UMass-Embodied-AGI / FlexAttention
View on GitHub
[ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models
☆47Jan 8, 2025Updated last year
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
vividblueprint / SE-RideSharingService-architecture
View on GitHub
This software architecture document aims to provide a detailed overview of the architecture of a ride-sharing service, including the key …
☆12May 11, 2025Updated last year
AI Agents on DigitalOcean Gradient AI Platform • Ad
Build production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
bruceyo / V-PETL
View on GitHub
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
☆26Oct 13, 2022Updated 3 years ago
elad-amrani / xtra
View on GitHub
PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025
☆14Nov 21, 2025Updated 6 months ago
YCaigogogo / CODER
View on GitHub
☆22Apr 27, 2024Updated 2 years ago
deepglint / UniME
View on GitHub
[ACM MM 2025] The official code of "Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs"
☆106Dec 8, 2025Updated 5 months ago
TAU-VAILab / hierarcaps
View on GitHub
Code and data for the paper "Emergent Visual-Semantic Hierarchies in Image-Text Representations" (ECCV 2024)
☆34Aug 12, 2024Updated last year
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
pixeli99 / MixLN
View on GitHub
[ICLR 2025] Official Pytorch Implementation of "Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN" by Pengxia…
☆30Jul 24, 2025Updated 10 months ago
i2vec / MM-R5
View on GitHub
The official repository of MM-R5
☆29Jun 22, 2025Updated 11 months ago
HowardLi1984 / ECDFormer
View on GitHub
【Nature Computational Science 2025🔥】Deep peak property learning for efficient chiral molecules ECD spectra prediction
☆51Jan 12, 2025Updated last year
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Zi-hao-Wei / Efficient-Vision-Language-Pre-training-by-Cluster-Masking
View on GitHub
[CVPR 2024] Improving language-visual pretraining efficiency by perform cluster-based masking on images.
☆33May 16, 2024Updated 2 years ago
HanSolo9682 / CounterCurate
View on GitHub
This is the implementation of CounterCurate, the data curation pipeline of both physical and semantic counterfactual image-caption pairs.
☆19Jun 27, 2024Updated last year
daixiangzi / VAR-CLIP
View on GitHub
Implements VAR+CLIP for text-to-image (T2I) generation
☆147Jan 23, 2025Updated last year
UCSC-VLAA / Recap-DataComp-1B
View on GitHub
[ICML 2025] This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"
☆151Jun 13, 2024Updated last year
tulip-berkeley / open_clip
View on GitHub
An open source implementation of CLIP (With TULIP Support)
☆165May 14, 2025Updated last year
jasonbian97 / flowwalk
View on GitHub
Implementation for flowwalk
☆33Mar 27, 2022Updated 4 years ago
zhangbw17 / MV-Adapter
View on GitHub
An official pytorch implementation of the paper: [MV-Adapter: Multimodal Video Transfer Learning for Video Text Retrieval].
☆14Jul 27, 2024Updated last year