Niujunbo2002 / NativeRes-LLaVALinks
Official code repo for our work "Native Visual Understanding: Resolving Resolution Dilemmas in Vision-Language Models"
☆39Updated last month
Alternatives and similar repositories for NativeRes-LLaVA
Users that are interested in NativeRes-LLaVA are comparing it to the libraries listed below
Sorting:
- ☆89Updated 3 months ago
- TokLIP: Marry Visual Tokens to CLIP for Multimodal Comprehension and Generation☆101Updated last month
- (CVPR 2025) PyramidDrop: Accelerating Your Large Vision-Language Models via Pyramid Visual Redundancy Reduction☆115Updated 4 months ago
- ☆53Updated 2 months ago
- ☆38Updated 2 months ago
- ☆174Updated 3 weeks ago
- A Collection of Papers on Diffusion Language Models☆90Updated last week
- [CVPR 2025] Adaptive Keyframe Sampling for Long Video Understanding☆80Updated 2 months ago
- Official Repository of paper: Envisioning Beyond the Pixels: Benchmarking Reasoning-Informed Visual Editing☆69Updated last week
- Code for "Stop Looking for Important Tokens in Multimodal Language Models: Duplication Matters More"☆60Updated 2 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆193Updated 2 months ago
- [CVPR 2025] DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models☆60Updated 2 weeks ago
- A paper list about Token Merge, Reduce, Resample, Drop for MLLMs.☆67Updated 6 months ago
- NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation☆78Updated last month
- The official implement of "Routing Experts: Learning to Route Dynamic Experts in Existing Multi-modal Large Language Models"☆15Updated 3 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆22Updated 5 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆176Updated last month
- Official implement of MIA-DPO☆59Updated 5 months ago
- [CVPR 2025] PVC: Progressive Visual Token Compression for Unified Image and Video Processing in Large Vision-Language Models☆45Updated last month
- Fast-Slow Thinking for Large Vision-Language Model Reasoning☆16Updated 2 months ago
- TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning☆83Updated last month
- [ACM MM 2025] TimeChat-online: 80% Visual Tokens are Naturally Redundant in Streaming Videos☆58Updated last week
- Empowering Unified MLLM with Multi-granular Visual Generation☆126Updated 6 months ago
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆78Updated last week
- ☆87Updated 3 weeks ago
- The code repository of UniRL☆33Updated last month
- LLaVA-PruMerge: Adaptive Token Reduction for Efficient Large Multimodal Models☆140Updated 2 weeks ago
- [CVPR2025] BOLT: Boost Large Vision-Language Model Without Training for Long-form Video Understanding☆23Updated 3 months ago
- 📖 This is a repository for organizing papers, codes, and other resources related to unified multimodal models.☆256Updated 3 weeks ago
- Official repository for CoMM Dataset☆43Updated 6 months ago