kyegomez/NaViT

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/kyegomez/NaViT)

kyegomez / NaViT

My implementation of "Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution"

☆274

Alternatives and similar repositories for NaViT

Users that are interested in NaViT are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

kyegomez / KosmosG
View on GitHub
My implementation of the model KosmosG from "KOSMOS-G: Generating Images in Context with Multimodal Large Language Models"
☆13Nov 11, 2024Updated last year
mira-space / MiraData
View on GitHub
Official repo for paper "MiraData: A Large-Scale Video Dataset with Long Durations and Structured Captions"
☆528Sep 2, 2024Updated last year
thunlp / LLaVA-UHD
View on GitHub
LLaVA-UHD v3: Progressive Visual Compression for Efficient Native-Resolution Encoding in MLLMs
☆423Jul 6, 2026Updated 2 weeks ago
ParadoxZW / LLaVA-UHD-Better
View on GitHub
A bug-free and improved implementation of LLaVA-UHD, based on the code from the official repo
☆35Aug 12, 2024Updated last year
lucidrains / magvit2-pytorch
View on GitHub
Implementation of MagViT2 Tokenizer in Pytorch
☆668Jan 12, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
NUS-HPC-AI-Lab / VideoSys
View on GitHub
VideoSys: An easy and efficient system for video generation
☆2,026Aug 27, 2025Updated 10 months ago
tsb0601 / MMVP
View on GitHub
☆364Jan 27, 2024Updated 2 years ago
facebookresearch / DiT
View on GitHub
Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"
☆8,689May 31, 2024Updated 2 years ago
TencentARC / SEED-Voken
View on GitHub
SEED-Voken: A Series of Powerful Visual Tokenizers
☆1,018Nov 25, 2025Updated 8 months ago
Vchitect / Latte
View on GitHub
[TMLR 2025] Latte: Latent Diffusion Transformer for Video Generation.
☆1,946Oct 30, 2025Updated 8 months ago
kyegomez / ViTAR
View on GitHub
Implementation of ViTaR: ViTAR: Vision Transformer with Any Resolution in PyTorch
☆41Nov 11, 2024Updated last year
baaivision / EVE
View on GitHub
EVE Series: Encoder-Free Vision-Language Models from BAAI
☆374Jul 24, 2025Updated last year
WeihuangLin / INF-LLaVA
View on GitHub
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model
☆42Aug 4, 2024Updated last year
LLaVA-VL / LLaVA-NeXT
View on GitHub
☆4,713Jun 15, 2026Updated last month
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
kyegomez / MGQA
View on GitHub
The open source implementation of the multi grouped query attention by the paper "GQA: Training Generalized Multi-Query Transformer Model…
☆17Dec 11, 2023Updated 2 years ago
lichao-sun / SoraReview
View on GitHub
The official GitHub page for the review paper "Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision M…
☆504Mar 21, 2024Updated 2 years ago
cambrian-mllm / cambrian
View on GitHub
Cambrian-1 is a family of multimodal LLMs with a vision-centric design.
☆2,008Nov 7, 2025Updated 8 months ago
bfshi / scaling_on_scales
View on GitHub
When do we not need larger vision models?
☆420Feb 8, 2025Updated last year
kyegomez / PALI3
View on GitHub
Implementation of PALI3 from the paper PALI-3 VISION LANGUAGE MODELS: SMALLER, FASTER, STRONGER"
☆147Updated this week
pixeli99 / SVD_Xtend
View on GitHub
Stable Video Diffusion Training Code and Extensions.
☆733Jul 25, 2024Updated 2 years ago
baaivision / Emu3
View on GitHub
Next-Token Prediction is All You Need
☆2,433Jan 12, 2026Updated 6 months ago
Visual-Agent / DeepEyes
View on GitHub
☆1,251Nov 20, 2025Updated 8 months ago
google-research / big_vision
View on GitHub
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
☆3,500May 19, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Dawars / DocMAE
View on GitHub
Unofficial implementation of DocMAE (WIP): Document Image Rectification via Self-supervised Representation Learning
☆20Dec 20, 2023Updated 2 years ago
whlzy / FiT
View on GitHub
[ICML 2024 Spotlight] FiT: Flexible Vision Transformer for Diffusion Model
☆434Nov 10, 2024Updated last year
KlingAIResearch / Koala-36M
View on GitHub
Official implementation of the paper "Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Vi…
☆253Mar 19, 2025Updated last year
Max-We / sf-zero-signal-to-noise
View on GitHub
Trying to implement https://arxiv.org/abs/2305.08891
☆34Jun 10, 2023Updated 3 years ago
google-research / magvit
View on GitHub
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
☆1,002Jan 17, 2024Updated 2 years ago
CircleRadon / TokenPacker
View on GitHub
The code for "TokenPacker: Efficient Visual Projector for Multimodal LLM", IJCV2025
☆279May 26, 2025Updated last year
camenduru / FoleyCrafter-jupyter
View on GitHub
☆10Jun 28, 2024Updated 2 years ago
BAAI-DCAI / Visual-Instruction-Tuning
View on GitHub
SVIT: Scaling up Visual Instruction Tuning
☆167Jun 20, 2024Updated 2 years ago
kyegomez / AlphaDev
View on GitHub
Implementation of the model from "Faster sorting algorithms discovered using deep reinforcement learning" that discovered an all-new ult…
☆11Aug 29, 2023Updated 2 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
sail-sg / MDT
View on GitHub
Masked Diffusion Transformer is the SOTA for image synthesis. (ICCV 2023)
☆596Apr 23, 2024Updated 2 years ago
ai-forever / MoVQGAN
View on GitHub
MoVQGAN - model for the image encoding and reconstruction
☆266Oct 31, 2023Updated 2 years ago
OpenGVLab / InternVL
View on GitHub
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
☆10,102Sep 22, 2025Updated 10 months ago
HJYao00 / DenseConnector
View on GitHub
【NeurIPS 2024】Dense Connector for MLLMs
☆183Oct 14, 2024Updated last year
ByteDance-Seed / Seed1.5-VL
View on GitHub
Seed1.5-VL, a vision-language foundation model designed to advance general-purpose multimodal understanding and reasoning, achieving stat…
☆1,583Jun 14, 2025Updated last year
PixArt-alpha / PixArt-alpha
View on GitHub
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
☆3,301Oct 31, 2024Updated last year
jy0205 / LaVIT
View on GitHub
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
☆603Oct 6, 2024Updated last year