PVIT-official/PVIT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/PVIT-official/PVIT)

PVIT-official / PVIT

Repository of paper: Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models

☆37

Alternatives and similar repositories for PVIT

Users that are interested in PVIT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

RUCAIBox / ComVint
View on GitHub
The official GitHub page for ''What Makes for Good Visual Instructions? Synthesizing Complex Visual Reasoning Instructions for Visual Ins…
☆19Nov 10, 2023Updated 2 years ago
THUNLP-MT / CODIS
View on GitHub
Repo for paper "CODIS: Benchmarking Context-Dependent Visual Comprehension for Multimodal Large Language Models".
☆13Oct 14, 2024Updated last year
mightyzau / RegionBLIP
View on GitHub
☆59Aug 7, 2023Updated 2 years ago
jshilong / GPT4RoI
View on GitHub
(ECCVW 2025)GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
☆556Jun 3, 2025Updated last year
shikras / shikra
View on GitHub
☆814Jul 8, 2024Updated 2 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
THUNLP-MT / EscapeCraft
View on GitHub
Official repo for EscapeCraft (an 3D environment for room escape) and benchmark MM-Escape. This work is accepted by ICCV 2025.
☆39Jul 7, 2025Updated last year
THUNLP-MT / ModelCompose
View on GitHub
Official code for our paper "Model Composition for Multimodal Large Language Models" (ACL 2024)
☆31Jan 8, 2025Updated last year
SCNU203 / GeoQA-Plus
View on GitHub
☆20May 14, 2024Updated 2 years ago
Meituan-AutoML / Lenna
View on GitHub
☆87Feb 5, 2024Updated 2 years ago
vlf-silkie / VLFeedback
View on GitHub
☆102Dec 22, 2023Updated 2 years ago
OFA-Sys / TouchStone
View on GitHub
Touchstone: Evaluating Vision-Language Models by Language Models
☆84Jan 18, 2024Updated 2 years ago
LiWentomng / gradio-osprey-demo
View on GitHub
Gradio demo used in our Osprey:Pixel Understanding with Visual Instruction Tuning.
☆16Dec 19, 2023Updated 2 years ago
zjuchenlong / WSAG
View on GitHub
[EMNLP'22] Weakly-Supervised Temporal Article Grounding
☆14Nov 25, 2023Updated 2 years ago
Yuqifan1117 / HalluciDoctor
View on GitHub
HalluciDoctor: Mitigating Hallucinatory Toxicity in Visual Instruction Data (Accepted by CVPR 2024)
☆52Jul 16, 2024Updated 2 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
FuxiaoLiu / LRV-Instruction
View on GitHub
[ICLR'24] Mitigating Hallucination in Large Multi-Modal Models via Robust Instruction Tuning
☆297Mar 13, 2024Updated 2 years ago
HaozheZhao / MIC
View on GitHub
MMICL, a state-of-the-art VLM with the in context learning ability from ICL, PKU
☆361Dec 18, 2023Updated 2 years ago
shizhediao / automate-cot
View on GitHub
Source code for the paper "Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data"
☆20Feb 24, 2024Updated 2 years ago
Mxbonn / ltmp
View on GitHub
Code for Learned Thresholds Token Merging and Pruning for Vision Transformers (LTMP). A technique to reduce the size of Vision Transforme…
☆17Nov 24, 2024Updated last year
zmykevin / UVLP
View on GitHub
CVPR 2022 (Oral) Pytorch Code for Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment
☆21Apr 15, 2022Updated 4 years ago
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year
X2FD / LVIS-INSTRUCT4V
View on GitHub
☆134Dec 22, 2023Updated 2 years ago
WPR001 / Ego-ST
View on GitHub
☆16Sep 25, 2025Updated 10 months ago
SY-Xuan / Pink
View on GitHub
Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs
☆100Jan 16, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
codecat0 / classifiction_networks
View on GitHub
图像分类网络Pytorch实现
☆14Nov 26, 2021Updated 4 years ago
LouChao98 / VLGAE
View on GitHub
Official Implementation for CVPR 2022 paper "Unsupervised Vision-Language Parsing: Seamlessly Bridging Visual Scene Graphs with Language …
☆24Oct 19, 2022Updated 3 years ago
OpenGVLab / LAMM
View on GitHub
[NeurIPS 2023 Datasets and Benchmarks Track] LAMM: Multi-Modal Large Language Models and Applications as AI Agents
☆317Apr 16, 2024Updated 2 years ago
JiwanChung / vlis
View on GitHub
☆24Oct 9, 2023Updated 2 years ago
DCDmllm / Cheetah
View on GitHub
☆354May 25, 2024Updated 2 years ago
yu2hi13 / P2SAM
View on GitHub
Official implementation for P2SAM (ACM MM 2024)
☆14Dec 7, 2024Updated last year
BAAI-DCAI / DataOptim
View on GitHub
A collection of visual instruction tuning datasets.
☆77Mar 14, 2024Updated 2 years ago
ChangyuChen347 / MaskedThought
View on GitHub
[ACL 2024] Masked Thought: Simply Masking Partial Reasoning Steps Can Improve Mathematical Reasoning Learning of Language Models
☆27Jul 9, 2024Updated 2 years ago
UCSC-VLAA / Sight-Beyond-Text
View on GitHub
[TMLR 2024] Official implementation of "Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics"
☆20Sep 15, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
Surrey-UP-Lab / RegionSpot
View on GitHub
Recognize Any Regions
☆123Dec 18, 2024Updated last year
takomc / amp
View on GitHub
【NeurIPS 2024】The official code of paper "Automated Multi-level Preference for MLLMs"
☆22Sep 26, 2024Updated last year
ylsung / vl-merging
View on GitHub
PyTorch codes for the paper "An Empirical Study of Multimodal Model Merging"
☆37Oct 11, 2023Updated 2 years ago
HL-hanlin / SMKD
View on GitHub
Code implementation for the paper: Supervised Masked Knowledge Distillation for Few-Shot Transformers
☆42Mar 29, 2023Updated 3 years ago
jingyanghuo / GeoVLN
View on GitHub
This is the official PyTorch implementation of the CVPR 2023 paper: "GeoVLN: Learning Geometry-Enhanced Visual Representation with Slot A…
☆10Mar 17, 2024Updated 2 years ago
mair-lab / mapl
View on GitHub
☆30May 27, 2023Updated 3 years ago
HLR / VLN-trans
View on GitHub
[ACL2023] Official code repository for VLN-Trans
☆14Sep 10, 2023Updated 2 years ago