Niujunbo2002/NativeRes-LLaVA

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/Niujunbo2002/NativeRes-LLaVA)

Niujunbo2002 / NativeRes-LLaVA

Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"

☆55

Alternatives and similar repositories for NativeRes-LLaVA

Users that are interested in NativeRes-LLaVA are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

JoeLeelyf / OVO-Bench
View on GitHub
[CVPR 2025] OVO-Bench: How Far is Your Video-LLMs from Real-World Online Video Understanding?
☆154Jul 24, 2025Updated 11 months ago
EvolvingLMMs-Lab / LLaVA-OneVision-1.5-RL
View on GitHub
Fully Open Framework for Democratized Multimodal Reinforcement Learning.
☆51Dec 19, 2025Updated 7 months ago
tanABCC / VABench
View on GitHub
☆16Jul 8, 2026Updated 2 weeks ago
xiaoxing2001 / DeGLA
View on GitHub
[ACM MM25] Official Pytorch implementation of [Decoupled Global-Local Alignment for Improving Compositional Understanding]
☆16Jul 15, 2025Updated last year
THUNLP-MT / CODIS
View on GitHub
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
☆13Oct 14, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
TobyYang7 / Llava_Qwen2
View on GitHub
Visual Instruction Tuning for Qwen2 Base Model
☆43Jun 29, 2024Updated 2 years ago
Luodian / nano-hevc
View on GitHub
A minimal, educational HEVC (H.265) encoder written in Python.
☆53Feb 23, 2026Updated 5 months ago
opendatalab / MinerU-Diffusion
View on GitHub
[ECCV 2026] A diffusion-based framework for document OCR that replaces autoregressive decoding with block-level parallel diffusion decodi…
☆625Jun 18, 2026Updated last month
ZichenWen1 / DIJA
View on GitHub
(ICLR 2026 🔥) Code for "The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs"
☆79Feb 9, 2026Updated 5 months ago
InternLM / OVO-S-Bench
View on GitHub
An official implementation of "OVO-S-Bench: A Hierarchical Benchmark for Streaming Spatial Intelligence in Multimodal LLMs"
☆47Jun 24, 2026Updated 3 weeks ago
VisionXLab / ProCLIP
View on GitHub
Official PyTorch implementation of ProCLIP: Progressive Vision-Language Alignment via LLM-based Embedder
☆25Dec 4, 2025Updated 7 months ago
deepglint / MLCD-Seg
View on GitHub
MLCD-Seg is a zero-shot segmentation model from DeepGlint.
☆18Jul 4, 2025Updated last year
kyegomez / open-moonvit
View on GitHub
This is an ultra-simple, single-file PyTorch implementation of MoonViT, the native-resolution vision encoder from Kimi-VL.
☆28Apr 25, 2026Updated 2 months ago
EvolvingLMMs-Lab / ParaVT
View on GitHub
ParaVT: Taming the Tool Prior Paradox for Parallel Tool Use in Agentic Video Reinforcement Learning
☆54Jun 2, 2026Updated last month
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
deepglint / DanQing
View on GitHub
The official repo for the DanQing dataset.
☆36Mar 25, 2026Updated 3 months ago
NJU-LINK / MT-Video-Bench
View on GitHub
The Source Code for MT-Video-Bench @ ACL Findings 2026
☆22Jan 20, 2026Updated 6 months ago
sakura20221 / RT-RAG
View on GitHub
☆18Jan 16, 2026Updated 6 months ago
InternLM / ARC-VL
View on GitHub
[CVPR 2026] An official implementation of "Think Visually, Reason Textually: Vision-Language Synergy in ARC"
☆46Nov 26, 2025Updated 7 months ago
InnovatorLM / Innovator-VL
View on GitHub
Fully Open-source Multimodal Language Models for Science Discovery
☆167Mar 20, 2026Updated 4 months ago
sunye23 / SAMA
View on GitHub
[NeurIPS 2025] SAMA: Towards Multi-Turn Referential Grounded Video Chat with Large Language Models.
☆17May 26, 2026Updated last month
OpenIXCLab / CODA
View on GitHub
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning
☆37Aug 28, 2025Updated 10 months ago
Gen-Verse / CURE
View on GitHub
[NeurIPS 2025 Spotlight] Co-Evolving LLM Coder and Unit Tester via Reinforcement Learning
☆167Sep 19, 2025Updated 10 months ago
gaostar123 / DeViL
View on GitHub
[ACM MM 2026] Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding
☆27Jul 12, 2026Updated last week
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
deepglint / MVT
View on GitHub
Margin-based Vision Transformer
☆70Apr 7, 2026Updated 3 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
MCG-NJU / VideoEval
View on GitHub
VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model
☆15Jul 31, 2025Updated 11 months ago
opendatalab / OHR-Bench
View on GitHub
(ICCV 2025) OCR Hinders RAG: Evaluating the Cascading Impact of OCR on Retrieval-Augmented Generation
☆104Dec 3, 2025Updated 7 months ago
TheRoadQaQ / ReLIFT
View on GitHub
Official Repository of "Learning what reinforcement learning can't"
☆84Dec 30, 2025Updated 6 months ago
liuting20 / DARA
View on GitHub
[ICME 2024 Oral] DARA: Domain- and Relation-aware Adapters Make Parameter-efficient Tuning for Visual Grounding
☆22Feb 26, 2025Updated last year
EvolvingLMMs-Lab / OpenMMReasoner
View on GitHub
[CVPR 2026] OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe
☆164Mar 30, 2026Updated 3 months ago
DPOOJ / dpooj
View on GitHub
Data Points Oriented Online Judge system for OO course
☆35Jun 15, 2024Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
MiracleDance / CAR
View on GitHub
CAR: Controllable AutoRegressive Modeling for Visual Generation
☆129Nov 29, 2024Updated last year
OpenGVLab / OmniCorpus
View on GitHub
[ICLR 2025 Spotlight] OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
☆425May 5, 2025Updated last year
hshjerry / VideoEspresso
View on GitHub
[CVPR 2025 Oral] VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
☆140Jul 28, 2025Updated 11 months ago
locuslab / llava-token-compression
View on GitHub
☆47Nov 8, 2024Updated last year
LaVi-Lab / Visual-Table
View on GitHub
[EMNLP 2024] Official code for "Beyond Embeddings: The Promise of Visual Table in Multi-Modal Models"
☆20Oct 17, 2024Updated last year
Mark12Ding / Dispider
View on GitHub
[CVPR 2025]Dispider: Enabling Video LLMs with Active Real-Time Interaction via Disentangled Perception, Decision, and Reaction
☆180Mar 23, 2025Updated last year
opendatalab / mineru-vl-utils
View on GitHub
A Python package for interacting with the MinerU Vision-Language Model.
☆136Jun 11, 2026Updated last month