xiangyu-mm / EasyGen
The official code for paper "EasyGen: Easing Multimodal Generation with a Bidirectional Conditional Diffusion Model and LLMs"
☆73Updated last month
Alternatives and similar repositories for EasyGen:
Users that are interested in EasyGen are comparing it to the libraries listed below
- ☆94Updated last year
- ☆59Updated 11 months ago
- Official code for "What Makes for Good Visual Tokenizers for Large Language Models?".☆56Updated last year
- [NeurIPS'24] Official PyTorch Implementation of Seeing the Image: Prioritizing Visual Correlation by Contrastive Alignment☆56Updated 3 months ago
- ☆87Updated last year
- [ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model☆32Updated 2 months ago
- ☆47Updated last year
- Touchstone: Evaluating Vision-Language Models by Language Models☆80Updated last year
- MLLM-Bench: Evaluating Multimodal LLMs with Per-sample Criteria☆59Updated 3 months ago
- MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models☆58Updated 4 months ago
- ChatBridge, an approach to learning a unified multimodal model to interpret, correlate, and reason about various modalities without rely…☆49Updated last year
- [Arxiv] Aligning Modalities in Vision Large Language Models via Preference Fine-tuning☆78Updated 8 months ago
- The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity". Th…☆42Updated 2 months ago
- code for "Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization"☆51Updated 4 months ago
- The codebase for our EMNLP24 paper: Multimodal Self-Instruct: Synthetic Abstract Image and Visual Reasoning Instruction Using Language Mo…☆67Updated last month
- Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)☆40Updated 2 months ago
- Enhancing Large Vision Language Models with Self-Training on Image Comprehension.☆62Updated 7 months ago
- ☆132Updated last year
- VideoHallucer, The first comprehensive benchmark for hallucination detection in large video-language models (LVLMs)☆24Updated 6 months ago
- Official repo for StableLLAVA☆94Updated last year
- LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation☆126Updated last year
- ☆54Updated 9 months ago
- Official implementation for "MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation?"☆41Updated last month
- [2024-ACL]: TextBind: Multi-turn Interleaved Multimodal Instruction-following in the Wildrounded Conversation☆47Updated last year
- Official implementation of our paper "Finetuned Multimodal Language Models are High-Quality Image-Text Data Filters".☆43Updated 2 weeks ago
- VideoNIAH: A Flexible Synthetic Method for Benchmarking Video MLLMs☆33Updated 2 months ago
- This is the official repo for Debiasing Large Visual Language Models, including a Post-Hoc debias method and Visual Debias Decoding strat…☆76Updated 9 months ago
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆95Updated 2 months ago
- ✨✨ MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?☆88Updated last month
- [NeurIPS 2023] Bootstrapping Vision-Language Learning with Decoupled Language Pre-training☆24Updated last year