JJJYmmm/Multimodal-RoPEs

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/JJJYmmm/Multimodal-RoPEs)

JJJYmmm / Multimodal-RoPEs

Official implement of paper "Revisiting Multimodal Positional Encoding in Vision–Language Models", ICLR 2026

☆88

Alternatives and similar repositories for Multimodal-RoPEs

Users that are interested in Multimodal-RoPEs are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

Shwai-He / SparseUnifiedModel
View on GitHub
The official implementation of the paper "Understanding and Harnessing Sparsity in Unified Multimodal Models".
☆23Apr 25, 2026Updated 2 months ago
aiha-lab / InfiniPot-V
View on GitHub
[NeurIPS 25] InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
☆20Jan 25, 2026Updated 5 months ago
Bujiazi / DiCache
View on GitHub
[ICLR 2026] Official implementation of DiCache: Let Diffusion Model Determine Its Own Cache
☆61Jan 26, 2026Updated 5 months ago
DTennant / GPC
View on GitHub
☆24Oct 12, 2024Updated last year
Yanwen-W / TeRA
View on GitHub
[ICCV 2025] TeRA: Rethinking Text-guided Realistic 3D Avatar Generation
☆19Sep 13, 2025Updated 10 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 8 months ago
OpenGVLab / V2PE
View on GitHub
[ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding
☆60Apr 4, 2026Updated 3 months ago
EvolvingLMMs-Lab / OneVision-Encoder
View on GitHub
Codec-Aligned Sparsity as a Foundational Principle for Multimodal Intelligence
☆385Jun 20, 2026Updated last month
xingdi-eric-yuan / qr-decoder
View on GitHub
☆13Nov 9, 2014Updated 11 years ago
OpenGVLab / Mono-InternVL
View on GitHub
[CVPR 2025] Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
☆109Jul 18, 2025Updated last year
MengLcool / DeepStack-VL
View on GitHub
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effect…
☆93Jun 17, 2024Updated 2 years ago
zlab-princeton / llm-distillation-jax
View on GitHub
JAX implementation of configurable LLM distillation training
☆24Nov 15, 2025Updated 8 months ago
byminji / map-the-flow
View on GitHub
[ICLR 2026] Official implementation of the paper "Map the Flow: Revealing Hidden Pathways of Information in VideoLLMs"
☆24Mar 3, 2026Updated 4 months ago
zhuyjan / WikiSeeker
View on GitHub
[ACL 2026] WikiSeeker: Rethinking the Role of Vision-Language Models in Knowledge-Based Visual Question Answering.
☆15Apr 18, 2026Updated 3 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
Becomebright / MTV
View on GitHub
Revisiting Multi-Task Visual Representation Learning
☆22Jan 21, 2026Updated 6 months ago
hrlics / HoPE
View on GitHub
[NeurIPS 2025] HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
☆29Feb 19, 2026Updated 5 months ago
snap-research / diffusability
View on GitHub
Source code for "Improving the Diffusability of Autoencoders" [ICML 2025]
☆21Jan 6, 2026Updated 6 months ago
JJJYmmm / Pix2SeqV2-Pytorch
View on GitHub
Simple Implementation of Pix2seqV2(multi-task)
☆26Dec 16, 2024Updated last year
CASIA-IVA-Lab / VRoPE
View on GitHub
[EMNLP 2025 Main] Official implementation of VRoPE: Rotary Position Embedding for Video Large Language Models.
☆28Nov 18, 2025Updated 8 months ago
meituan-longcat / LongCat-Next
View on GitHub
☆464Updated this week
FengheTan9 / MambaMIM
View on GitHub
[MedIA 2025] MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation
☆41Aug 10, 2025Updated 11 months ago
ashawkey / grid_put
View on GitHub
An operation trying to do the opposite of F.grid_sample
☆20Aug 8, 2023Updated 2 years ago
KarnaYip / C2RoPE
View on GitHub
[ICRA 26] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning
☆27Feb 13, 2026Updated 5 months ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
cszzx / GRAIN
View on GitHub
[CVPR 2022 Oral] Faithful Extreme Rescaling via Generative Prior Reciprocated Invertible Representations
☆13Jul 14, 2022Updated 4 years ago
synvo-ai / local-cocoa
View on GitHub
A local AI assistant running on your device. It turns your files into actionable memory.
☆55Mar 24, 2026Updated 3 months ago
lgxi24 / AdaBlock-dLLM
View on GitHub
[ICLR 2026] AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
☆15Jan 28, 2026Updated 5 months ago
ATH-MaaS / Awesome-Unified-Multimodal-Models
View on GitHub
Awesome Unified Multimodal Models
☆1,302Mar 24, 2026Updated 3 months ago
SKT-AI / A.X-3
View on GitHub
SKT A.X LLM 3.1
☆13Jul 24, 2025Updated 11 months ago
ReyChiaro / diffusers-tuner
View on GitHub
A Lightweight, Configuration-Driven, Flexible Fine-Tuning Framework for 🤗 Diffusers
☆17Apr 15, 2026Updated 3 months ago
Bezdarnost / awesome-super-resolution
View on GitHub
collection with description of super-resolution related papers, repositories, datasets, loss functions and etc.
☆11Dec 12, 2023Updated 2 years ago
nsping13 / GAN-Steerability-without-optimization
View on GitHub
☆15Jan 12, 2024Updated 2 years ago
jinnh / E-Bridge
View on GitHub
[ICLR 2026] Energy-oriented Diffusion Bridge for Image Restoration with Foundational Diffusion Models
☆15Apr 13, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
SJTU-DENG-Lab / LoPA
View on GitHub
LoPA: Scaling dLLM Inference via Lookahead Parallel Decoding
☆39Apr 25, 2026Updated 2 months ago
shubhamprshr27 / NeglectedTailsVLM
View on GitHub
This repository houses the code for the paper - "The Neglected of VLMs"
☆30Dec 31, 2025Updated 6 months ago
EzioBy / 3dpe
View on GitHub
[ECCV 2024] 3DPE: Real-time 3D-aware Portrait Editing from a Single Image
☆22Sep 15, 2025Updated 10 months ago
lose4578 / CircleRoPE
View on GitHub
☆15Sep 1, 2025Updated 10 months ago
OpenWebRL / OpenWebRL
View on GitHub
Code for paper OpenWebRL: Online Multi-Turn Reinforcement Learning for Visual Web Agents
☆37Jul 9, 2026Updated last week
Liuxinyv / HiPrompt
View on GitHub
[IJCV 2026] HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts
☆26Feb 28, 2025Updated last year
Wiselnn570 / VideoRoPE
View on GitHub
[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++
☆223Apr 15, 2026Updated 3 months ago