Amshaker/Mobile-VideoGPT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Amshaker/Mobile-VideoGPT)

Amshaker / Mobile-VideoGPT

Mobile-VideoGPT: Fast and Accurate Video Understanding Language Model

☆142

Alternatives and similar repositories for Mobile-VideoGPT

Users that are interested in Mobile-VideoGPT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mbzuai-oryx / ClimateGPT
View on GitHub
[EMNLP'23] ClimateGPT: a specialized LLM for conversations related to Climate Change and Sustainability topics in both English and Arabi…
☆79Sep 24, 2024Updated last year
mbzuai-oryx / VideoMathQA
View on GitHub
VideoMathQA is a benchmark designed to evaluate mathematical reasoning in real-world educational videos
☆24May 7, 2026Updated 2 months ago
Amshaker / MAVOS
View on GitHub
[WACV 2025] Efficient Video Object Segmentation via Modulated Cross-Attention Memory
☆61Feb 28, 2025Updated last year
mbzuai-oryx / Video-CoM
View on GitHub
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
☆22Jun 17, 2026Updated last month
mbzuai-oryx / VideoGPT-plus
View on GitHub
Official Repository of paper VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
☆293Aug 5, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Amshaker / GroupMamba
View on GitHub
[CVPR -2025] GroupMamba: Parameter-Efficient and Accurate Group Visual State Space Model
☆142Mar 22, 2025Updated last year
wzk1015 / Awesome-Vision-to-Music-Generation
View on GitHub
[ISMIR 2025] A curated list of vision-to-music generation: methods, datasets, evaluation and challenges.
☆126Aug 9, 2025Updated 11 months ago
shuzhangzhong / HybriMoE-Preview
View on GitHub
☆17Apr 9, 2025Updated last year
mbzuai-oryx / CVRR-Evaluation-Suite
View on GitHub
[CVPRW-25 MMFM] Official repository of paper titled "How Good is my Video LMM? Complex Video Reasoning and Robustness Evaluation Suite fo…
☆50Aug 23, 2024Updated last year
mbzuai-oryx / Video-R2
View on GitHub
Video-R2: Reinforcing Consistent and Grounded Reasoning in Multimodal Language Models
☆19Jan 21, 2026Updated 6 months ago
umair1221 / WorldCache
View on GitHub
WorldCache: Content-Aware Caching for Accelerated Video World Models
☆21Jun 28, 2026Updated 3 weeks ago
OpenGVLab / FluxViT
View on GitHub
Make Your Training Flexible: Towards Deployment-Efficient Video Models
☆40Jun 11, 2025Updated last year
mbzuai-oryx / XrayGPT
View on GitHub
[BIONLP@ACL 2024] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models.
☆530Aug 8, 2024Updated last year
asif-hanif / vafa
View on GitHub
[MICCAI 2023] Official code repository of paper titled "Frequency Domain Adversarial Training for Robust Volumetric Medical Segmentation"…
☆52Nov 14, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
mominabbass / LinC
View on GitHub
Code for "Enhancing In-context Learning via Linear Probe Calibration"
☆38Apr 24, 2024Updated 2 years ago
mbzuai-oryx / VideoMolmo
View on GitHub
Official code of the paper "VideoMolmo: Spatio-Temporal Grounding meets Pointing"
☆56Jul 5, 2025Updated last year
Amshaker / Mobile-O
View on GitHub
[CVPR'26 Demo] Mobile-O: Unified Multimodal Understanding and Generation on Mobile Device
☆154Apr 13, 2026Updated 3 months ago
EsmaeilNarimissa / aws-sft-grpo-budget-llm-finetune
View on GitHub
☆19May 17, 2025Updated last year
mmaaz60 / EdgeNeXt
View on GitHub
[CADL'22, ECCVW] Official repository of paper titled "EdgeNeXt: Efficiently Amalgamated CNN-Transformer Architecture for Mobile Vision Ap…
☆417Jul 25, 2023Updated 3 years ago
Muhammad-Huzaifaa / ObjectCompose
View on GitHub
[ACCV 2024] ObjectCompose: Evaluating Resilience of Vision-Based Models on Object-to-Background Compositional Changes 🚀🚀🚀
☆37Jan 21, 2025Updated last year
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆180Mar 23, 2025Updated last year
mbzuai-oryx / Video-ChatGPT
View on GitHub
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the cap…
☆1,505Aug 5, 2025Updated 11 months ago
TencentARC / SEED-Bench-R1
View on GitHub
☆100Jun 23, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
mbzuai-oryx / EvoLMM
View on GitHub
Self Evolving Large Multimodal Models with Continuous Rewards
☆25Jun 9, 2026Updated last month
ZBox1005 / CoT-UQ
View on GitHub
[ACL 2025] "CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought"
☆17Apr 3, 2025Updated last year
mmaaz60 / mdef_detr
View on GitHub
☆11May 9, 2023Updated 3 years ago
HashmatShadab / Robust-LLaVA
View on GitHub
[ICCVW 2025 (Oral)] Robust-LLaVA: On the Effectiveness of Large-Scale Robust Image Encoders for Multi-modal Large Language Models
☆29Oct 20, 2025Updated 9 months ago
abdohelmy / D-3Former
View on GitHub
Official repository of paper titled "D3Former: Debiased Dual Distilled Transformer for Incremental Learning".
☆25Jul 10, 2023Updated 3 years ago
mbzuai-oryx / ARB
View on GitHub
ARB: A Comprehensive Arabic Multimodal Reasoning Benchmark
☆17May 25, 2025Updated last year
XiaoduoAILab / XmodelVLM
View on GitHub
☆68Jun 20, 2024Updated 2 years ago
ShahinaKK / LG_SDG
View on GitHub
Language Grounded Single Source Domain Generalization in Medical Image Segmentation [ISBI2024]
☆33Oct 27, 2024Updated last year
mbzuai-oryx / PALO
View on GitHub
(WACV 2025 - Oral) Vision-language conversation in 10 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, H…
☆85Aug 5, 2025Updated 11 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
rohit901 / VANE-Bench
View on GitHub
[NAACL'25] Contains code and documentation for our VANE-Bench paper.
☆24Aug 19, 2025Updated 11 months ago
OpenGVLab / VideoChat-R1
View on GitHub
[NIPS2025] VideoChat-R1 & R1.5: Enhancing Spatio-Temporal Perception and Reasoning via Reinforcement Fine-Tuning
☆268Oct 18, 2025Updated 9 months ago
HashmatShadab / HSAT
View on GitHub
[MICCAI 2025] Hierarchical Self-Supervised Adversarial Training for Robust Vision Models in Histopathology
☆12Jun 17, 2025Updated last year
OpenGVLab / vinci
View on GitHub
Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model
☆93Nov 27, 2025Updated 7 months ago
fazliimam / NoLA
View on GitHub
NoLA Codebase
☆28May 31, 2026Updated last month
shenao-zhang / reward-augmented-preference
View on GitHub
The official implementation of Preference Data Reward-Augmentation.
☆18May 1, 2025Updated last year
Amshaker / unetr_plus_plus
View on GitHub
[IEEE TMI-2024] UNETR++: Delving into Efficient and Accurate 3D Medical Image Segmentation
☆532Dec 14, 2025Updated 7 months ago