facebookresearch / pixioLinks
Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction
☆344Updated 3 weeks ago
Alternatives and similar repositories for pixio
Users that are interested in pixio are comparing it to the libraries listed below
Sorting:
- Official implementation of DepthLM☆305Updated this week
- Scaling Vision Pre-Training to 4K Resolution☆221Updated last month
- [NeurIPS 2025] Official code for JAFAR: Jack up Any Feature at Any Resolution☆216Updated 2 months ago
- [ICCV'25 oral] Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"☆249Updated last month
- ☆124Updated 5 months ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆216Updated last month
- Visual Spatial Tuning☆172Updated last week
- Generative World Explorer☆165Updated 7 months ago
- Towards training VQ-VAE models robustly!☆91Updated 6 months ago
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆60Updated 7 months ago
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 4 months ago
- [NeurIPS 2025] The official repository of "Sekai: A Video Dataset towards World Exploration"☆258Updated last month
- [ICML 2025] Official Implementation for SimDINO/SimDINOv2☆192Updated 10 months ago
- A Large-scale Video Action Dataset☆388Updated 3 weeks ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆201Updated 9 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated last week
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆497Updated this week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- Official PyTorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆101Updated 10 months ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Models☆348Updated 2 months ago
- Public release of the code for "Accelerating Vision Transformers with Adaptive Patches"☆90Updated 3 months ago
- PyTorch implementation of NEPA☆308Updated 2 weeks ago
- [ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning☆309Updated last month
- [ICLR 2026] 🐻 Uniform Discrete Diffusion with Metric Path for Video Generation☆102Updated this week
- A list of works on video generation towards world model☆337Updated this week
- Official repo for UAE☆164Updated last month
- (ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations☆128Updated 2 months ago
- [CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆178Updated 11 months ago
- [Arxiv'25] DINO-Tok: Adapting DINO for Visual Tokenizers☆35Updated 2 months ago
- Scaling Properties of Diffusion Models For Perceptual Tasks (CVPR 2025)☆44Updated 9 months ago