facebookresearch / pixioLinks
Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction
☆316Updated 2 weeks ago
Alternatives and similar repositories for pixio
Users that are interested in pixio are comparing it to the libraries listed below
Sorting:
- Official implementation of DepthLM☆283Updated this week
- Scaling Vision Pre-Training to 4K Resolution☆217Updated this week
- Towards training VQ-VAE models robustly!☆91Updated 5 months ago
- [NeurIPS 2025] Official code for JAFAR: Jack up Any Feature at Any Resolution☆213Updated last month
- ☆118Updated 4 months ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆168Updated 3 weeks ago
- [NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"☆235Updated last week
- [ICCV'25 oral] Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"☆243Updated 2 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆164Updated 3 months ago
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆465Updated 3 weeks ago
- Generative World Explorer☆165Updated 6 months ago
- A list of works on video generation towards world model☆313Updated this week
- [CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆176Updated 10 months ago
- [ICML 2025] Official Implementation for SimDINO/SimDINOv2☆182Updated 9 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆145Updated last week
- PyTorch implementation of NEPA☆262Updated 2 weeks ago
- Pytorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆101Updated 9 months ago
- Visual Spatial Tuning☆161Updated this week
- Orient Anything, ICML 2025☆367Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆94Updated 10 months ago
- Official repo for UAE☆125Updated last week
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Models☆341Updated last month
- NeoVerse: Enhancing 4D World Model with in-the-wild Monocular Videos☆200Updated this week
- OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling☆409Updated 3 weeks ago
- ☆81Updated 3 weeks ago
- (3DV 2026 Oral) L4P -- a feed-forward foundational model designed for multiple low-level 4D vision perception tasks.☆51Updated last month
- [ICLR 2025] Official Implementation of M3: 3D-Spatial Multimodal Memory☆196Updated 8 months ago
- Scene-Centric Unsupervised Panoptic Segmentation (CVPR 2025 Highlight)☆78Updated 3 months ago
- Scaling Properties of Diffusion Models For Perceptual Tasks (CVPR 2025)☆44Updated 8 months ago
- [Arxiv'25] MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization☆54Updated 3 months ago