d3tk / REOrderLinks
Does patch ordering affect context-limited vision transformers?
☆13Updated last month
Alternatives and similar repositories for REOrder
Users that are interested in REOrder are comparing it to the libraries listed below
Sorting:
- ☆32Updated 2 months ago
- ☆98Updated 3 months ago
- ☆38Updated 11 months ago
- LoRA-Ensemble: Efficient Uncertainty Modelling for Self-attention Networks☆48Updated 9 months ago
- Official PyTorch Implementation for Vision-Language Models Create Cross-Modal Task Representations, ICML 2025☆27Updated 2 months ago
- [ICLR 2025] Source code for paper "A Spark of Vision-Language Intelligence: 2-Dimensional Autoregressive Transformer for Efficient Finegr…☆76Updated 7 months ago
- Official implementation of "PyVision: Agentic Vision with Dynamic Tooling."☆24Updated last week
- Official implementation of the paper "MMInA: Benchmarking Multihop Multimodal Internet Agents"☆46Updated 4 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆102Updated last week
- Official implementation and dataset for the NAACL 2024 paper "ComCLIP: Training-Free Compositional Image and Text Matching"☆35Updated 11 months ago
- OLA-VLM: Elevating Visual Perception in Multimodal LLMs with Auxiliary Embedding Distillation, arXiv 2024☆60Updated 4 months ago
- [ECCV 2024] Official Release of SILC: Improving vision language pretraining with self-distillation☆44Updated 9 months ago
- [CVPR'24 Highlight] PyTorch Implementation of Object Recognition as Next Token Prediction☆180Updated 2 months ago
- ZoomEye: Enhancing Multimodal LLMs with Human-Like Zooming Capabilities through Tree-Based Image Exploration☆46Updated 6 months ago
- Official implementation of ECCV24 paper: POA☆24Updated 11 months ago
- Scaffold Prompting to promote LMMs☆43Updated 7 months ago
- ☆61Updated 4 months ago
- Evaluation and dataset construction code for the CVPR 2025 paper "Vision-Language Models Do Not Understand Negation"☆27Updated 2 months ago
- Code for the paper "Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers" [ICCV 2025]☆73Updated 3 weeks ago
- Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆75Updated 3 weeks ago
- [ICLR 2025] Video Action Differencing☆41Updated 2 weeks ago
- Code for "Are “Hierarchical” Visual Representations Hierarchical?" in NeurIPS Workshop for Symmetry and Geometry in Neural Representation…☆21Updated last year
- An open source implementation of CLIP (With TULIP Support)☆160Updated 2 months ago
- [ICLR 2025] Video-STaR: Self-Training Enables Video Instruction Tuning with Any Supervision☆65Updated last year
- This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR2025]☆71Updated 2 weeks ago
- [AAAI 2025] Does VLM Classification Benefit from LLM Description Semantics?☆17Updated 6 months ago
- Official repository of paper "Subobject-level Image Tokenization" (ICML-25)☆78Updated 2 weeks ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆35Updated last week
- Code and data for the paper: Learning Action and Reasoning-Centric Image Editing from Videos and Simulation☆30Updated 2 weeks ago
- ☆22Updated 6 months ago