Scaling Vision Pre-Training to 4K Resolution
☆221Jan 4, 2026Updated last month
Alternatives and similar repositories for PS3
Users that are interested in PS3 are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory☆198Apr 26, 2025Updated 10 months ago
- PyTorch implementation of "Sample- and Parameter-Efficient Auto-Regressive Image Models" from CVPR 2025☆14Nov 21, 2025Updated 3 months ago
- Official repository for "AM-RADIO: Reduce All Domains Into One"☆1,665Feb 11, 2026Updated 2 weeks ago
- ☆27Jun 4, 2024Updated last year
- [ICCV 2025] Implementation for Describe Anything: Detailed Localized Image and Video Captioning☆1,453Jun 26, 2025Updated 8 months ago
- ☆10Apr 7, 2025Updated 10 months ago
- Code for Scaling Language-Free Visual Representation Learning (WebSSL)☆245Apr 24, 2025Updated 10 months ago
- This repository provides the code and model checkpoints for AIMv1 and AIMv2 research projects.☆1,402Aug 4, 2025Updated 6 months ago
- EVE Series: Encoder-Free Vision-Language Models from BAAI☆368Jul 24, 2025Updated 7 months ago
- [NeurIPS 2025] HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation☆76Sep 19, 2025Updated 5 months ago
- Adaptive Length Image Tokenization via Recurrent Allocation | How many tokens is an image worth ?☆145Feb 11, 2025Updated last year
- Official PyTorch Implementation of "Latent Denoising Makes Good Visual Tokenizers"☆175Feb 24, 2026Updated last week
- Empowering Unified MLLM with Multi-granular Visual Generation☆129Jan 16, 2025Updated last year
- State-of-the-art Image & Video CLIP, Multimodal Large Language Models, and More!☆2,177Feb 11, 2026Updated 2 weeks ago
- [ICCV 2025] Official repo for "GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation"☆198Jan 7, 2026Updated last month
- [ICLR 2025] 3DitScene: Editing Any Scene via Language-guided Disentangled Gaussian Splatting☆258Nov 23, 2024Updated last year
- Unraveling the Effects of Synthetic Data on End-to-End Autonomous Driving☆32Nov 20, 2025Updated 3 months ago
- official repo for paper "[CLS] Token Tells Everything Needed for Training-free Efficient MLLMs"☆22Apr 23, 2025Updated 10 months ago
- ☆14Apr 25, 2025Updated 10 months ago
- [NeurIPS 2025] Beyond Masked and Unmasked: Discrete Diffusion Models via Partial Masking☆22Oct 22, 2025Updated 4 months ago
- [NeurIPS 2025 Spotlight] A Unified Tokenizer for Visual Generation and Understanding☆512Nov 14, 2025Updated 3 months ago
- VILA is a family of state-of-the-art vision language models (VLMs) for diverse multimodal AI tasks across the edge, data center, and clou…☆3,766Nov 28, 2025Updated 3 months ago
- [ICLR 2025] VILA-U: a Unified Foundation Model Integrating Visual Understanding and Generation☆418Apr 25, 2025Updated 10 months ago
- SEED-Voken: A Series of Powerful Visual Tokenizers☆996Nov 25, 2025Updated 3 months ago
- [NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆172Sep 19, 2025Updated 5 months ago
- CycleReward is a reward model trained on cycle consistency preferences to measure image-text alignment.☆54Nov 3, 2025Updated 3 months ago
- [NeurIPS 2025, Spotlight]: Ambient-o: Training Good models with Bad Data.☆30Jan 21, 2026Updated last month
- ☆19Jun 4, 2025Updated 8 months ago
- Pruned CoTracker architecture for tracking the myocardium in 2D echo images.☆19May 6, 2025Updated 9 months ago
- This repo contains the code for 1D tokenizer and generator☆1,117Mar 20, 2025Updated 11 months ago
- [ICCV 2025 & ICCV 2025 RIWM Outstanding Paper] Aether: Geometric-Aware Unified World Modeling☆573Oct 26, 2025Updated 4 months ago
- RayGen: Multi-Modal Dataset Reinforcement for MobileCLIP and MobileCLIP2☆39Aug 29, 2025Updated 6 months ago
- Resa: Transparent Reasoning Models via SAEs☆47Sep 23, 2025Updated 5 months ago
- [NeurIPS 2025] Source codes for the paper "MindJourney: Test-Time Scaling with World Models for Spatial Reasoning"☆133Nov 4, 2025Updated 3 months ago
- Official Repo For Pixel-LLM Codebase☆1,543Jan 23, 2026Updated last month
- [NeurIPS 2024] Official implementation of the paper "Interfacing Foundation Models' Embeddings"☆131Aug 21, 2024Updated last year
- DenseFusion-1M: Merging Vision Experts for Comprehensive Multimodal Perception☆159Dec 6, 2024Updated last year
- Diffusion Powers Video Tokenizer for Comprehension and Generation (CVPR 2025)☆86Feb 27, 2025Updated last year
- [ICCV 2025] Official code for paper: Beyond Text-Visual Attention: Exploiting Visual Cues for Effective Token Pruning in VLMs☆68Jul 1, 2025Updated 8 months ago