atfortes / Awesome-Controllable-Diffusion
Papers and resources on Controllable Generation using Diffusion Models, including ControlNet, DreamBooth, IP-Adapter.
β430Updated 4 months ago
Alternatives and similar repositories for Awesome-Controllable-Diffusion:
Users that are interested in Awesome-Controllable-Diffusion are comparing it to the libraries listed below
- Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing imagβ¦β493Updated 9 months ago
- π₯π₯π₯ A curated list of papers on LLMs-based multimodal generation (image, video, 3D and audio).β425Updated 3 weeks ago
- π This is a repository for organizing papers, codes and other resources related to unified multimodal models.β364Updated 3 weeks ago
- [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creationβ415Updated 2 months ago
- (CVPR2024)A benchmark for evaluating Multimodal LLMs using multiple-choice questions.β327Updated last month
- Research Trends in LLM-guided Multimodal Learning.β357Updated last year
- π Code and models for the NeurIPS 2023 paper "Generating Images with Multimodal Language Models".β447Updated last year
- LaVIT: Empower the Large Language Model to Understand and Generate Visual Contentβ560Updated 4 months ago
- Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.β585Updated this week
- Official implementation of SEED-LLaMA (ICLR 2024).β597Updated 4 months ago
- β308Updated last year
- A list of works on evaluation of visual generation models, including evaluation metrics, models, and systemsβ239Updated 3 weeks ago
- Awesome_Multimodel is a curated GitHub repository that provides a comprehensive collection of resources for Multimodal Large Language Modβ¦β292Updated 2 weeks ago
- Aligning LMMs with Factually Augmented RLHFβ345Updated last year
- [ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuningβ269Updated 11 months ago
- MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKUβ342Updated last year
- β160Updated 7 months ago
- β¨β¨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysisβ456Updated 2 months ago
- A reading list of video generationβ489Updated this week
- [NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agentsβ307Updated 10 months ago
- [Neurips'24 Spotlight] Visual CoT: Advancing Multi-Modal Language Models with a Comprehensive Dataset and Benchmark for Chain-of-Thought β¦β227Updated last month
- [ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"β746Updated 6 months ago
- π A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).β578Updated last month
- Recent advancements propelled by large language models (LLMs), encompassing an array of domains including Vision, Audio, Agent, Robotics,β¦β117Updated this week
- Long Context Transfer from Language to Visionβ360Updated 2 months ago
- [ECCV 2024] ShareGPT4V: Improving Large Multi-modal Models with Better Captionsβ193Updated 7 months ago
- This repo lists relevant papers summarized in our survey paper: A Systematic Survey of Prompt Engineering on Vision-Language Foundation β¦β428Updated 3 months ago
- [CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understandingβ586Updated 2 weeks ago
- RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthinessβ291Updated 2 months ago
- [CVPR 2024 Highlight] OPERA: Alleviating Hallucination in Multi-Modal Large Language Models via Over-Trust Penalty and Retrospection-Alloβ¦β309Updated 5 months ago