jialuli-luka/SELMA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jialuli-luka/SELMA)

jialuli-luka / SELMA

Code and Data for Paper: SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data

☆35

Alternatives and similar repositories for SELMA

Users that are interested in SELMA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Yui010206 / Adaptive-Visual-Imagination-Control
View on GitHub
When and How Much to Imagine: Adaptive Test-Time Scaling with World Models for Visual Spatial Reasoning
☆18Jun 2, 2026Updated last month
daeunni / Video-Skill-CoT
View on GitHub
Code for "Skill-based Chain-of-Thoughts for Domain-Adaptive Video Reasoning [EMNLP 2025 Findings]"
☆18Aug 27, 2025Updated 10 months ago
Yui010206 / MEXA
View on GitHub
[EMNLP 2025 Findings] MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
☆15Aug 22, 2025Updated 10 months ago
jaehong31 / RACCooN
View on GitHub
(EMNLP 2025 Main) RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
☆37Dec 20, 2025Updated 7 months ago
jialuli-luka / Video-MSG
View on GitHub
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
☆28Apr 14, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
07Agarg / HAF
View on GitHub
Code for the Paper Learning Hierarchy Aware Features for Reducing Mistake Severity, accepted in ECCV 2022
☆15Dec 16, 2022Updated 3 years ago
wangbohan97 / MPS
View on GitHub
☆13Jul 5, 2024Updated 2 years ago
daeunni / VideoRepair
View on GitHub
Code for "VideoRepair: Improving Text-to-Video Generation via Misalignment Evaluation and Localized Refinement [ACL 2026 Findings]"
☆52Apr 7, 2026Updated 3 months ago
Yui010206 / VEGGIE-VidEdit
View on GitHub
[ICCV2025] VEGGIE: Instructional Editing and Reasoning Video Concepts with Grounded Generation
☆34Aug 18, 2025Updated 11 months ago
RockeyCoss / SPO
View on GitHub
[CVPR 2025] Aesthetic Post-Training Diffusion Models from Generic Preferences with Step-by-step Preference Optimization
☆271Apr 7, 2025Updated last year
MCG-NJU / OCSampler
View on GitHub
[CVPR 2022] OCSampler: Compressing Videos to One Clip with Single-step Sampling
☆17Jun 21, 2022Updated 4 years ago
ExplainableML / ImageSelect
View on GitHub
Code for the paper "If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection"
☆27Jul 10, 2023Updated 3 years ago
Yui010206 / Ego2Web
View on GitHub
[CVPR 2026] Ego2Web: A Web Agent Benchmark Grounded in Egocentric Videos
☆29Mar 25, 2026Updated 3 months ago
snap-research / VIMI
View on GitHub
☆13Jul 10, 2024Updated 2 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
lavoiems / DiscreteLatentCode
View on GitHub
Official repository for the article Compositional Discrete Latent Code for High Fidelity, Productive Diffusion Models (https://arxiv.org/…
☆38Sep 5, 2025Updated 10 months ago
Yui010206 / CREMA
View on GitHub
[ICLR 2025] CREMA: Generalizable and Efficient Video-Language Reasoning via Multimodal Modular Fusion
☆56Jul 1, 2025Updated last year
kodenii / Responsible-Visual-Editing
View on GitHub
Responsible Visual Editing
☆15Jul 10, 2024Updated 2 years ago
Mars-tin / fast-spatial-mem
View on GitHub
Fast Spatial Memory with Elastic Test-Time Training (4D-LRM + 4D-LVSM)
☆102Jun 20, 2026Updated last month
sunilhoho / EVEREST
View on GitHub
Official Pytorch implementation of EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens [ICML2024].
☆31Jun 15, 2024Updated 2 years ago
passing2961 / DialogCC
View on GitHub
Official code and dataset for our NAACL 2024 paper: DialogCC: An Automated Pipeline for Creating High-Quality Multi-modal Dialogue Datase…
☆13Jun 24, 2024Updated 2 years ago
HL-hanlin / Bifrost-1
View on GitHub
Official implementation of Bifrost-1: Bridging Multimodal LLMs and Diffusion Models with Patch-level CLIP Latents (NeurIPS 2025)
☆47Nov 24, 2025Updated 7 months ago
ziqipang / ADDP
View on GitHub
[ICLR 2025] Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
☆15Jul 4, 2025Updated last year
jaehong31 / SAFREE
View on GitHub
[ICLR 2025] SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image and Video Generation
☆59Jan 22, 2025Updated last year
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
HAWLYQ / ET-Cap
View on GitHub
☆24Oct 8, 2023Updated 2 years ago
Kwai-Klear / AR-GRPO
View on GitHub
Training Autoregressive Image Generation models via Reinforcement Learning
☆53Nov 26, 2025Updated 7 months ago
aisagarw / awesome-explainable-cv
View on GitHub
☆12Oct 17, 2024Updated last year
yigu1008 / Diffusion-RPO
View on GitHub
☆15Mar 30, 2025Updated last year
meetdavidwan / crg
View on GitHub
PyTorch code for "Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training"
☆39Mar 4, 2024Updated 2 years ago
ai-compiler-study / triton-kernels
View on GitHub
Triton kernels for Flux
☆23Jul 7, 2025Updated last year
XuweiyiChen / UniCtrl
View on GitHub
[TMLR] Official implementation of UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free U…
☆75Nov 29, 2024Updated last year
Yaofang-Liu / FVDM
View on GitHub
Code for Paper 'Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach'
☆36Jan 2, 2026Updated 6 months ago
ExplainableML / HyperNoise
View on GitHub
☆70Dec 5, 2025Updated 7 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
feizc / Video-In-Context
View on GitHub
Video Diffusion Transformers are In-Context Learners
☆37Jan 6, 2025Updated last year
HL-hanlin / V-Co
View on GitHub
Official implementation of V-Co: A Closer Look at Visual Representation Alignment via Co-Denoising (ECCV 2026)
☆27Jun 29, 2026Updated 3 weeks ago
ShihaoZhaoZSH / LaVi-Bridge
View on GitHub
[ECCV 2024] Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
☆300Jul 17, 2024Updated 2 years ago
magcil / movie_shot_classification_dataset
View on GitHub
A dataset with classified film shots
☆11Aug 8, 2022Updated 3 years ago
h6kplus / PhyMotion
View on GitHub
Official implementation of paper "PhyMotion: Structured 3D Motion Reward for Physics-Grounded Human Video Generation"
☆42May 15, 2026Updated 2 months ago
ruchikachavhan / concept-prune
View on GitHub
Code for the paper - ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning
☆24Aug 13, 2024Updated last year
WikiChao / ScalingConcept
View on GitHub
☆24Nov 1, 2024Updated last year