atfortes / Awesome-Controllable-Diffusion
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
β446Updated last week
Alternatives and similar repositories for Awesome-Controllable-Diffusion:
Users that are interested in Awesome-Controllable-Diffusion are comparing it to the libraries listed below
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ427Updated 3 months ago
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β448Updated last year
- Research Trends in LLM-guided Multimodal Learning.β357Updated last year
- π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).β441Updated last week
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β422Updated last week
- Official implementation of SEED-LLaMA (ICLR 2024).β604Updated 6 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ346Updated last year
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β507Updated 11 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β332Updated 2 months ago
- β320Updated last year
- A Survey on multimodal learning research.β322Updated last year
- β165Updated 8 months ago
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ570Updated 5 months ago
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systemsβ263Updated 3 weeks ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β307Updated last week
- Aligning LMMs with Factually Augmented RLHFβ359Updated last year
- Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"β864Updated 3 months ago
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuningβ272Updated last year
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β603Updated 3 months ago
- MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities (ICML 2024)β291Updated 2 months ago
- [NeurIPS 2023] Official implementations of "Cheap and Quick: Efficient Vision-Language Instruction Tuning for Large Language Models"β518Updated last year
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β611Updated this week
- Official PyTorch implementation of the paper "In-Context Learning Unlocked for Diffusion Models"β399Updated last year
- [CVPR'24] HallusionBench: You See What You Think? Or You Think What You See? An Image-Context Reasoning Benchmark Challenging for GPT-4V(β¦β273Updated 4 months ago
- LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusiβ¦β465Updated 6 months ago
- The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Mβ¦β496Updated last year
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Alloβ¦β323Updated 7 months ago
- A reading list of video generationβ524Updated this week
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β274Updated 3 months ago
- PyTorch implementation of InstructDiffusion, a unifying and generic framework for aligning computer vision tasks with human instructions.β421Updated 10 months ago