CyberAgentAILab / webcolor
Official implementation of Generative Colorization of Structured Mobile Web Pages, WACV 2023.
☆22Updated last year
Alternatives and similar repositories for webcolor:
Users that are interested in webcolor are comparing it to the libraries listed below
- ☆21Updated last year
- [CVPR 2023 highlight] Towards Flexible Multi-modal Document Models☆56Updated last year
- Source code of the TextLap model, a LLM for text-2-layout generation.☆14Updated 5 months ago
- [CVPR 2024] Official PyTorch implementation of "ECLIPSE: Revisiting the Text-to-Image Prior for Efficient Image Generation"☆62Updated 10 months ago
- A curated list of papers and resources for text-to-image evaluation.☆28Updated last year
- An interactive demo based on Segment-Anything for stroke-based painting which enables human-like painting.☆34Updated last year
- INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model☆42Updated 7 months ago
- ☆33Updated last year
- Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data☆33Updated last year
- ☆16Updated 7 months ago
- ☆19Updated last year
- LLaVA combines with Magvit Image tokenizer, training MLLM without an Vision Encoder. Unifying image understanding and generation.☆35Updated 9 months ago
- ☆19Updated last year
- ☆13Updated 2 months ago
- ECCV2024_Parrot Captions Teach CLIP to Spot Text☆64Updated 6 months ago
- Official implementation of "Is This Loss Informative? Faster Text-to-Image Customization by Tracking Objective Dynamics" (NeurIPS 2023)☆37Updated last year
- The official repository of paper "ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection" (N…☆50Updated last year
- Minimal Differentiable Image Reward Functions☆51Updated 3 weeks ago
- Code Release for the paper "Make-A-Story: Visual Memory Conditioned Consistent Story Generation" in CVPR 2023☆39Updated last year
- ☆12Updated 2 months ago
- ☆22Updated last year
- [PR 2024] A large Cross-Modal Video Retrieval Dataset with Reading Comprehension☆25Updated last year
- [ECCV2022] Mind the Gap in Distilling StyleGANs☆29Updated last year
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆18Updated 3 years ago
- FuseCap: Leveraging Large Language Models for Enriched Fused Image Captions☆55Updated 11 months ago
- Pytorch implementation of HyperLLaVA: Dynamic Visual and Language Expert Tuning for Multimodal Large Language Models☆28Updated last year
- Official repository for VQDM:Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization paper☆33Updated 6 months ago
- [NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…☆35Updated 9 months ago
- Official code for SeMani (CVPR 2020 oral and Journal extension)☆23Updated last year
- imagetokenizer is a python package, helps you encoder visuals and generate visuals token ids from codebook, supports both image and video…☆31Updated 9 months ago