OpenGVLab/V2PE

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/OpenGVLab/V2PE)

OpenGVLab / V2PE

[ICCV2025] V2PE: Improving Multimodal Long-Context Capability of Vision-Language Models with Variable Visual Position Encoding

☆60

Alternatives and similar repositories for V2PE

Users that are interested in V2PE are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

haon-chen / mmE5
View on GitHub
☆59Feb 27, 2025Updated last year
hrlics / HoPE
View on GitHub
[NeurIPS 2025] HoPE: Hybrid of Position Embedding for Long Context Vision-Language Models
☆29Feb 19, 2026Updated 5 months ago
TIGER-AI-Lab / VideoEval-Pro
View on GitHub
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation [TMLR26]
☆15Jun 1, 2026Updated last month
zhjohnchan / bert-clip-synesthesia
View on GitHub
[Findings of ACL-2023] This is the official implementation of On the Difference of BERT-style and CLIP-style Text Encoders.
☆14Jun 7, 2023Updated 3 years ago
yuecao0119 / MMInstruct
View on GitHub
[SCIS 2024] The official implementation of the paper "MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Di…
☆64Nov 7, 2024Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
findalexli / mllm-dpo
View on GitHub
[ACL 2024] Multi-modal preference alignment remedies regression of visual instruction tuning on language model
☆48Nov 10, 2024Updated last year
namespace-Pt / UltraGist
View on GitHub
☆18Dec 2, 2024Updated last year
ByteDance-Seed / SAIL
View on GitHub
Implementation for "The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer"
☆85Oct 29, 2025Updated 8 months ago
TIGER-AI-Lab / VISTA
View on GitHub
The code for "VISTA: Enhancing Long-Duration and High-Resolution Video Understanding by VIdeo SpatioTemporal Augmentation" [CVPR2025]
☆20Feb 27, 2025Updated last year
liutianlin0121 / decoding-time-realignment
View on GitHub
Implementation of "Decoding-time Realignment of Language Models", ICML 2024.
☆21Jun 17, 2024Updated 2 years ago
ErikZ719 / CoTA
View on GitHub
[ICLR 26] Context Tokens are Anchors: Understanding the Repeat Curse in dMLLMs from an Information Flow Perspective
☆16Mar 6, 2026Updated 4 months ago
TIGER-AI-Lab / ABC
View on GitHub
ABC: Achieving Better Control of Multimodal Embeddings using VLMs [TMLR2025]
☆19Aug 21, 2025Updated 10 months ago
haoyu-bu / CAFe
View on GitHub
Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"
☆33Mar 26, 2025Updated last year
uvavision / SyViC
View on GitHub
[ICCV 2023] Going Beyond Nouns With Vision & Language Models Using Synthetic Data
☆13Sep 30, 2023Updated 2 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
YuxiXie / V-DPO
View on GitHub
Preference Learning for LLaVA
☆60Nov 9, 2024Updated last year
sled-group / COMFORT
View on GitHub
[ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…
☆22Oct 24, 2024Updated last year
TobiasLee / VEC
View on GitHub
Visual and Embodied Concepts evaluation benchmark
☆21Oct 10, 2023Updated 2 years ago
lemon-prog123 / LongRePS
View on GitHub
Chain-of-Thought Matters: Improving Long-Context Language Models with Reasoning Path Supervision
☆19Apr 1, 2025Updated last year
deepglint / UniDoc-RL
View on GitHub
UniDoc-RL: Unified Document Understanding with Reinforcement Learning
☆16May 21, 2026Updated 2 months ago
Lyun0912-wu / LongAttn
View on GitHub
LongAttn ：Selecting Long-context Training Data via Token-level Attention
☆15Jul 16, 2025Updated last year
eesast / web
View on GitHub
Web front for EESAST
☆11Updated this week
deepglint / Victor
View on GitHub
ViCToR: Improving Visual Comprehension via Token Reconstruction for Pretraining LMMs
☆29Aug 15, 2025Updated 11 months ago
Wiselnn570 / VideoRoPE
View on GitHub
[ICML 2025 Oral] An official implementation of VideoRoPE & VideoRoPE++
☆223Apr 15, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
amitakamath / vl_text_encoders_are_bottlenecks
View on GitHub
Code and datasets for "Text encoders are performance bottlenecks in contrastive vision-language models". Coming soon!
☆11May 24, 2023Updated 3 years ago
KlingAIResearch / Uniaa
View on GitHub
Unified Multi-modal IAA Baseline and Benchmark
☆94Sep 27, 2024Updated last year
locuslab / scaling_laws_data_filtering
View on GitHub
☆64Apr 9, 2024Updated 2 years ago
ys-zong / MIRB
View on GitHub
Benchmarking Multi-Image Understanding in Vision and Language Models
☆11Jul 29, 2024Updated last year
KarnaYip / C2RoPE
View on GitHub
[ICRA 26] C^2ROPE: Causal Continuous Rotary Positional Encoding for 3D Large Multimodal-Models Reasoning
☆27Feb 13, 2026Updated 5 months ago
DAMO-NLP-SG / CMM
View on GitHub
✨✨The Curse of Multi-Modalities (CMM): Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
☆54Jul 11, 2025Updated last year
TIGER-AI-Lab / Mantis
View on GitHub
Official code for Paper "Mantis: Multi-Image Instruction Tuning" [TMLR 2024 Best Paper]
☆239Jan 3, 2026Updated 6 months ago
open-compass / MMBench-GUI
View on GitHub
Official repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent w…
☆112Sep 8, 2025Updated 10 months ago
lgxi24 / AdaBlock-dLLM
View on GitHub
[ICLR 2026] AdaBlock-dLLM: Semantic-Aware Diffusion LLM Inference via Adaptive Block Size
☆15Jan 28, 2026Updated 5 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
llyx97 / TempCompass
View on GitHub
[ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …
☆133Apr 4, 2025Updated last year
ChangyaoTian / VL-LTR
View on GitHub
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
☆69Oct 1, 2022Updated 3 years ago
chenxn2020 / GOSE
View on GitHub
[Paper] Code for the EMNLP2023 (Findings) paper "Global Structure Knowledge-Guided Relation Extraction Method for Visually-Rich Document"
☆17Dec 1, 2023Updated 2 years ago
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated 11 months ago
DIRECT-BIT / SRA-MCTS
View on GitHub
☆36Jun 5, 2025Updated last year
adobe-research / llava-score
View on GitHub
☆11Oct 2, 2024Updated last year
TIGER-AI-Lab / MEGA-Bench
View on GitHub
This repo contains the code for "MEGA-Bench Scaling Multimodal Evaluation to over 500 Real-World Tasks" [ICLR 2025]
☆81Jul 1, 2025Updated last year