Amshaker/Mobile-O

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Amshaker/Mobile-O)

Amshaker / Mobile-O

[CVPR'26 Demo] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device

☆154

Alternatives and similar repositories for Mobile-O

Users that are interested in Mobile-O are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mbzuai-oryx / VideoMathQA
View on GitHub
VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos
☆24May 7, 2026Updated 2 months ago
umair1221 / WorldCache
View on GitHub
WorldCache: Content-Aware Caching for Accelerated Video World Models
☆21Jun 28, 2026Updated 3 weeks ago
sen-mao / Loopfree
View on GitHub
[CVPR2025] Official Implementations "One-Way Ticket : Time-Independent Unified Encoder for Distilling Text-to-Image Diffusion Models"
☆29Mar 16, 2026Updated 4 months ago
mbzuai-oryx / EvoLMM
View on GitHub
Self Evolving Large Multimodal Models with Continuous Rewards
☆25Jun 9, 2026Updated last month
mbzuai-oryx / Video-CoM
View on GitHub
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
☆22Jun 17, 2026Updated last month
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 6 months ago
flying-sky999 / OmniV2V
View on GitHub
☆15Jun 2, 2025Updated last year
hustvl / MobileI2V
View on GitHub
[ArXiv 2025] MobileI2V: Fast and High-Resolution Image-to-Video on Mobile Devices
☆87May 20, 2026Updated 2 months ago
AniAggarwal / ecad
View on GitHub
[ICLR 2026] Code for Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model
☆30Mar 1, 2026Updated 4 months ago
Amshaker / MAVOS
View on GitHub
[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory
☆61Feb 28, 2025Updated last year
sen-mao / FasterVAR
View on GitHub
[ICML2026] Official Implementations "FasterVAR: Plug-and-Play Acceleration for Visual Autoregressive Models"
☆27Jul 9, 2026Updated last week
Hanzy1996 / OpenSeg-R
View on GitHub
OpenSeg-R: Improving Open-Vocabulary Segmentation via Step-by-Step Visual Reasoning
☆29May 24, 2025Updated last year
mbzuai-oryx / Agent-X
View on GitHub
ICLR 2026: Agent-X Evaluating Deep Multimodal Reasoning in Vision-Centric Agentic Tasks
☆43Apr 28, 2026Updated 2 months ago
Amshaker / Mobile-VideoGPT
View on GitHub
Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model
☆142Aug 6, 2025Updated 11 months ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
Fr0zenCrane / Uni-ViGU
View on GitHub
Uni-ViGU: Towards Unified Video Generation and Understanding via A Diffusion-Based Video Generator
☆33Apr 15, 2026Updated 3 months ago
parameterlab / dr-llm
View on GitHub
[ICLR 2026 🔥] Dr.LLM: Dynamic Layer Routing in LLMs
☆56Apr 24, 2026Updated 2 months ago
Amshaker / GroupMamba
View on GitHub
[CVPR -2025] GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model
☆142Mar 22, 2025Updated last year
OpenGVLab / InternVL-U
View on GitHub
InternVL-U is a 4B-parameter unified multimodal model (UMM) that brings multimodal understanding, reasoning, image generation, image edit…
☆291Mar 21, 2026Updated 4 months ago
OPPO-Mente-Lab / X2Edit
View on GitHub
AAAI2026 X2Edit: Revisiting Arbitrary-Instruction Image Editing through Self-Constructed Data and Task-Aware Representation Learning
☆97Nov 21, 2025Updated 8 months ago
mbzuai-oryx / LongShOT
View on GitHub
A Benchmark and Agentic Framework for Omni-Modal Reasoning and Tool Use in Long Videos
☆21Jun 20, 2026Updated last month
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
AMD-AGI / Nitro-E
View on GitHub
Nitro-E is a family of text-to-image diffusion models focused on highly efficient training.
☆125Jun 4, 2026Updated last month
ByteVisionLab / DreamLite
View on GitHub
[ECCV 2026] 🔥 Official impl. of "DreamLite: A Lightweight On-Device Unified Model for Image Generation and Editing".
☆731Jun 12, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
ai-forever / VIBE
View on GitHub
☆55Feb 9, 2026Updated 5 months ago
End2End-Diffusion / REPA-E
View on GitHub
[ICCV 2025] Official implementation of the paper: REPA-E: Unlocking VAE for End-to-End Tuning of Latent Diffusion Transformers
☆511Dec 6, 2025Updated 7 months ago
xudonmao / PairEdit
View on GitHub
☆26Nov 25, 2025Updated 7 months ago
ByteDance-Seed / BM-code
View on GitHub
[Arxiv 2025] ByteMorph: Benchmarking Instruction-Guided Image Editing with Non-Rigid Motions
☆45Jun 11, 2025Updated last year
nusnlp / d2vlm
View on GitHub
[ICCV 2025] Factorized Learning for Temporally Grounded Video-Language Models
☆24Apr 18, 2026Updated 3 months ago
zhang0jhon / diffusion-4k
View on GitHub
[CVPR 2025] Diffusion-4K: Ultra-High-Resolution Image Synthesis with Latent Diffusion Models
☆365Nov 24, 2025Updated 7 months ago
sen-mao / FasterDiffusion-DiT
View on GitHub
Official Implementations "Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference" for DiT (NeurIPS'24)
☆15Aug 3, 2025Updated 11 months ago
weichow23 / EditMGT
View on GitHub
Official Repo for Paper <EditMGT Unleashing the Potential of Masked Generative Transformer in Image Editing>
☆79Dec 20, 2025Updated 7 months ago
VinAIResearch / SwiftBrush
View on GitHub
SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation (CVPR 2024)
☆72Jun 24, 2026Updated 3 weeks ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
pnotp / ArcFlow
View on GitHub
ArcFlow: Unleashing 2-Step Text-to-Image Generation via High-Precision Non-Linear Flow Distillation
☆128May 20, 2026Updated 2 months ago
THU-KEG / LongWriter-V
View on GitHub
[ACM MM25] LongWriter-V: Enabling Ultra-Long and High-Fidelity Generation in Vision-Language Models
☆24Mar 29, 2025Updated last year
lifan724 / magic_eraser
View on GitHub
☆20Jul 14, 2024Updated 2 years ago
ZitengWangNYU / Scale-RAE
View on GitHub
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders
☆255Feb 13, 2026Updated 5 months ago
ByteVisionLab / NextFlow
View on GitHub
NextFlow🚀: Unified Sequential Modeling Activates Multimodal Understanding and Generation
☆331Jan 9, 2026Updated 6 months ago
wuer5 / OMGSR
View on GitHub
Offical repo for "OMGSR: You Only Need One Mid-timestep Guidance for Real-World Image Super-Resolution"
☆118Mar 11, 2026Updated 4 months ago
hecoding / Hyper-Modulation
View on GitHub
Official Implementation for "Transferring Unconditional to Conditional GANs with Hyper-Modulation" CVPRW 22 https://arxiv.org/abs/2112.02…
☆13Jun 28, 2022Updated 4 years ago