wangf3014 / Patch_ScalingLinks
Official implementation of Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
☆23Updated 9 months ago
Alternatives and similar repositories for Patch_Scaling
Users that are interested in Patch_Scaling are comparing it to the libraries listed below
Sorting:
- Adapting LLaMA Decoder to Vision Transformer☆30Updated last year
- ☆64Updated 6 months ago
- Github repository for "Bring Reason to Vision: Understanding Perception and Reasoning through Model Merging" (ICML 2025)☆86Updated 2 months ago
- DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding☆65Updated 5 months ago
- Official implementation of "Traceable Evidence Enhanced Visual Grounded Reasoning: Evaluation and Methodology"☆71Updated last month
- [ICLR2025] γ -MOD: Mixture-of-Depth Adaptation for Multimodal Large Language Models☆40Updated last month
- 🔥 [ICLR 2025] Official PyTorch Model "Visual Haystacks: A Vision-Centric Needle-In-A-Haystack Benchmark"☆22Updated 9 months ago
- [NeurIPS 2024] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective☆73Updated last year
- [COLM'25] Official implementation of the Law of Vision Representation in MLLMs☆170Updated 2 months ago
- Code for "AVG-LLaVA: A Multimodal Large Model with Adaptive Visual Granularity"☆33Updated last year
- Dimple, the first Discrete Diffusion Multimodal Large Language Model☆112Updated 4 months ago
- ☆45Updated last year
- [ECCV 2024] FlexAttention for Efficient High-Resolution Vision-Language Models☆46Updated 10 months ago
- More reliable Video Understanding Evaluation☆12Updated 2 months ago
- Distributed Optimization Infra for learning CLIP models☆27Updated last year
- Implementation and dataset for paper "Can MLLMs Perform Text-to-Image In-Context Learning?"☆41Updated 6 months ago
- ☆19Updated 10 months ago
- ☆46Updated 11 months ago
- Matryoshka Multimodal Models☆119Updated 10 months ago
- Official implement of MIA-DPO☆67Updated 10 months ago
- [TMLR] Public code repo for paper "A Single Transformer for Scalable Vision-Language Modeling"☆147Updated last year
- [ECCV 2024] AdaNAT: Exploring Adaptive Policy for Token-Based Image Generation☆34Updated last year
- [SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…☆60Updated last year
- [AAAI 2025] HiRED strategically drops visual tokens in the image encoding stage to improve inference efficiency for High-Resolution Visio…☆43Updated 7 months ago
- [CVPR'2025] VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".☆195Updated 5 months ago
- A PyTorch implementation of the paper "Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis"☆46Updated last year
- [NeurIPS 2024] Official PyTorch implementation of "Improving Compositional Reasoning of CLIP via Synthetic Vision-Language Negatives"☆46Updated last year
- ☆22Updated 6 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆72Updated 2 months ago
- (ICLR 2025 Spotlight) Official code repository for Interleaved Scene Graph.☆31Updated 3 months ago