A-new-b / flex_edit
we propose FlexEdit, an end-to-end image editing method that leverages both free-shape masks and language instructions for Flexible Editing.
☆24Updated 2 months ago
Related projects ⓘ
Alternatives and complementary repositories for flex_edit
- Official Pytorch Implementation of Self-emerging Token Labeling☆30Updated 7 months ago
- ☆59Updated 5 months ago
- ChatterBox: Multi-round Multimodal Referring and Grounding, Multimodal, Multi-round dialogues☆50Updated 6 months ago
- Web2Code: A Large-scale Webpage-to-Code Dataset and Evaluation Framework for Multimodal LLMs☆62Updated 3 weeks ago
- [ECCV'24 Workshops Oral] DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling☆27Updated 2 weeks ago
- MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment☆31Updated 4 months ago
- Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"☆30Updated 3 weeks ago
- Project for "LaSagnA: Language-based Segmentation Assistant for Complex Queries".☆47Updated 6 months ago
- Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegrained Image …☆55Updated last month
- ☆23Updated last week
- ☆25Updated last year
- Official implementation of Add-SD: Rational Generation without Manual Reference.☆26Updated 3 months ago
- [IJCV 2024] MosaicFusion: Diffusion Models as Data Augmenters for Large Vocabulary Instance Segmentation☆112Updated last month
- Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want☆61Updated 3 weeks ago
- This is the official repository of our paper "What If We Recaption Billions of Web Images with LLaMA-3 ?"☆121Updated 5 months ago
- ☆30Updated this week
- PyTorch code for "ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning"☆17Updated 3 weeks ago
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models☆30Updated this week
- ☆30Updated 3 months ago
- A Framework for Decoupling and Assessing the Capabilities of VLMs☆38Updated 4 months ago
- How Good is Google Bard's Visual Understanding? An Empirical Study on Open Challenges☆30Updated last year
- AAPL: Adding Attributes to Prompt Learning for Vision-Language Models (CVPRw 2024)☆31Updated 6 months ago
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆110Updated 3 months ago
- Code for our ICLR 2024 paper "PerceptionCLIP: Visual Classification by Inferring and Conditioning on Contexts"☆76Updated 6 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆115Updated last week
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆39Updated 3 months ago
- Object Recognition as Next Token Prediction (CVPR 2024 Highlight)☆161Updated last month
- LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture☆179Updated last month
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆38Updated 7 months ago
- Official implementation of "Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models"☆35Updated 10 months ago