FrankYang-17/MME-VideoOCR

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/FrankYang-17/MME-VideoOCR)

FrankYang-17 / MME-VideoOCR

☆40

Alternatives and similar repositories for MME-VideoOCR

Users that are interested in MME-VideoOCR are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

FrankYang-17 / Mavors
View on GitHub
☆16May 30, 2025Updated last year
FrankYang-17 / RealUnify
View on GitHub
☆28Oct 10, 2025Updated 9 months ago
MME-Benchmarks / MME-Unify
View on GitHub
✨✨ [ICLR 2026] MME-Unify: A Comprehensive Benchmark for Unified Multimodal Understanding and Generation Models
☆43Apr 10, 2025Updated last year
VidCapBench / VidCapBench
View on GitHub
☆13May 17, 2025Updated last year
OmniMMI / M4
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆19Apr 2, 2025Updated last year
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
VITA-MLLM / Sparrow
View on GitHub
Sparrow: Data-Efficient Video-LLM with Text-to-Image Augmentation
☆32Mar 28, 2025Updated last year
Kwai-YuanQi / MM-RLHF
View on GitHub
The Next Step Forward in Multimodal LLM Alignment
☆199May 1, 2025Updated last year
ULMEvalKit / ULMEvalKit
View on GitHub
ULMEvalKit: One-Stop Eval ToolKit for Image Generation
☆56Dec 17, 2025Updated 7 months ago
OmniMMI / OmniMMI
View on GitHub
[CVPR 2025] OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts
☆23Jul 14, 2026Updated 2 weeks ago
Ryann-Ran / Scone
View on GitHub
(CVPR 2026 Highlight) Official repository for Scone (Subject-driven COmposition and DistinctioN Enhancement) model, supporting subject co…
☆32Apr 9, 2026Updated 3 months ago
hwanyu112 / VIBE-Benchmark
View on GitHub
☆27Feb 3, 2026Updated 5 months ago
MME-Benchmarks / MME-RealWorld
View on GitHub
✨✨ [ICLR 2025] MME-RealWorld: Could Your Multimodal LLM Challenge High-Resolution Real-World Scenarios that are Difficult for Humans?
☆161Oct 21, 2025Updated 9 months ago
Henry839 / PaperMaster
View on GitHub
☆15Apr 14, 2026Updated 3 months ago
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 6 months ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
OpenGVLab / VRBench
View on GitHub
[ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos
☆28Jun 4, 2026Updated last month
OmniMMI / OpenOmniNexus
View on GitHub
a fully open-source implementation of a GPT-4o-like speech-to-speech video understanding model.
☆38Apr 7, 2025Updated last year
MiG-NJU / PersonaVLM
View on GitHub
[CVPR 2026 Highlight] PersonaVLM: Long-Term Personalized Multimodal LLMs
☆112Apr 16, 2026Updated 3 months ago
Abe404 / mask_to_dicom_rtstruct
View on GitHub
Convert numpy mask to dicom rtstruct
☆13Feb 28, 2023Updated 3 years ago
synvo-ai / HippoCamp
View on GitHub
A benchmark for evaluating contextual agents on realistic multimodal personal-computer environments with profiling and factual-retention …
☆29Apr 2, 2026Updated 3 months ago
yfzhang114 / r1_reward
View on GitHub
✨✨ [ICLR 2026] R1-Reward: Training Multimodal Reward Model Through Stable Reinforcement Learning
☆291May 9, 2025Updated last year
NROwind / OpenGPT-4o-Image
View on GitHub
A Comprehensive Dataset for Advanced Image Generation and Editing}
☆33Oct 2, 2025Updated 9 months ago
bigai-nlco / CREAM
View on GitHub
[NeurIPS 2024] | An Efficient Recipe for Long Context Extension via Middle-Focused Positional Encoding
☆22Oct 10, 2024Updated last year
isjinghao / SemiT-SAM
View on GitHub
[MICCAI 2024 workshop] Official implementation of "SemiT-SAM: Building a Visual Foundation Model for Tooth Instance Segmentation on Panor…
☆15Nov 13, 2024Updated last year
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
zhuzil / VTC-Bench
View on GitHub
VisualToolChain-Bench
☆48May 4, 2026Updated 2 months ago
zou-yawen / Dataset-Distillation-via-Vision-Language-Category-Prototype
View on GitHub
Dataset Distillation via Vision-Language Category Prototype (ICCV 2025)
☆17Mar 20, 2026Updated 4 months ago
MarilynKeller / BTInference
View on GitHub
Bone and Tissue inference wrapper
☆16Nov 7, 2024Updated last year
agents-x-project / PyVision-RL
View on GitHub
[ICML 2026] Official implementation of "PyVision-RL: Forging Open Agentic Vision Models via RL."
☆70Feb 25, 2026Updated 5 months ago
sterzhang / PVIT
View on GitHub
Official Repository of Personalized Visual Instruct Tuning
☆34Mar 6, 2025Updated last year
Ki-Zhang / HUST_AIA_MachineLearning_Big_Homework
View on GitHub
小样本跨域学习-模式识别大作业
☆28May 10, 2024Updated 2 years ago
Malberee / merlo-ui
View on GitHub
🚀 Beautiful React Native UI library
☆16Dec 26, 2025Updated 7 months ago
egolife-ai / Ego-R1
View on GitHub
[TPAMI 2026] Ego-R1: Agentic Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
☆165Jun 10, 2026Updated last month
Andy-Cheng / TEMPURA
View on GitHub
TEMPURA enables video-language models to reason about causal event relationships and generate fine-grained, timestamped descriptions of u…
☆27Jun 4, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Wang-Xiaodong1899 / Awesome-Multimodal-Large-Language-Models
View on GitHub
🔥Awesome Multimodal Large Language Models Paper List
☆154Mar 12, 2025Updated last year
G-U-N / UniRL
View on GitHub
[ICML 2026] a unified reinforcement learning toolbox for joint RL on language models and diffusion models
☆91May 26, 2026Updated 2 months ago
Jialuo-Li / DIG
View on GitHub
[CVPR 2026] Divide, then Ground: Adapting Frame Selection to Query Types for Long-Form Video Understanding
☆21Feb 21, 2026Updated 5 months ago
khive-ai / pydapter
View on GitHub
adapt data to and from every format
☆28Apr 27, 2026Updated 3 months ago
JHU-CLSP / RATIONALYST
View on GitHub
Code for RATIONALYST: Pre-training Process-Supervision for Improving Reasoning https://arxiv.org/pdf/2410.01044
☆36Oct 3, 2024Updated last year
chenlong-clock / RULE-Unlearn
View on GitHub
[NeurIPS25] RULE: Reinforcement UnLEarning Achieves Forge-retain Pareto Optimality
☆20Oct 22, 2025Updated 9 months ago
Wang-ML-Lab / multimodal-needle-in-a-haystack
View on GitHub
[NAACL 2025 Oral] Multimodal Needle in a Haystack (MMNeedle): Benchmarking Long-Context Capability of Multimodal Large Language Models
☆55Apr 22, 2026Updated 3 months ago