facebookresearch / pixioLinks
Pixio: a capable vision encoder dedicated to dense prediction, simply by pixel reconstruction
☆342Updated 2 weeks ago
Alternatives and similar repositories for pixio
Users that are interested in pixio are comparing it to the libraries listed below
Sorting:
- Scaling Vision Pre-Training to 4K Resolution☆221Updated last month
- Official implementation of DepthLM☆290Updated last week
- [NeurIPS 2025] Official code for JAFAR: Jack up Any Feature at Any Resolution☆215Updated 2 months ago
- [ICCV'25 oral] Official Code for "LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models"☆248Updated 3 weeks ago
- [ICML 2025] Official Implementation for SimDINO/SimDINOv2☆192Updated 10 months ago
- ☆122Updated 5 months ago
- Towards training VQ-VAE models robustly!☆91Updated 6 months ago
- Official implementation for What matters for Representation Alignment: Global Information or Spatial Structure?☆209Updated last month
- Visual Spatial Tuning☆171Updated this week
- SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding☆60Updated 6 months ago
- A Large-scale Video Action Dataset☆376Updated 2 weeks ago
- [CVPR 2025] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention☆177Updated 11 months ago
- Official PyTorch implementation of FlowMo.☆110Updated 9 months ago
- Generative World Explorer☆165Updated 7 months ago
- PyTorch implementation of NEPA☆303Updated last week
- Scaling Spatial Intelligence with Multimodal Foundation Models☆160Updated 3 weeks ago
- [CVPR 2024] Probing the 3D Awareness of Visual Foundation Models☆348Updated 2 months ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆200Updated 9 months ago
- A list of works on video generation towards world model☆334Updated this week
- [NeurIPS 2025] Official Implementation of DINO-Foresight: Looking into the Future with DINO☆146Updated 2 months ago
- Scene-Centric Unsupervised Panoptic Segmentation (CVPR 2025 Highlight)☆80Updated 4 months ago
- Official repo for UAE☆161Updated last month
- SpatialVID: A Large-Scale Video Dataset with Spatial Annotations☆486Updated last week
- Multi-SpatialMLLM Multi-Frame Spatial Understanding with Multi-Modal Large Language Models☆167Updated 3 months ago
- (ICCV 2025) ReferDINO: Referring Video Object Segmentation with Visual Grounding Foundations☆127Updated 2 months ago
- Official PyTorch implementation of GaussianToken: An Effective Image Tokenizer with 2D Gaussian Splatting☆101Updated 10 months ago
- Public release of the code for "Accelerating Vision Transformers with Adaptive Patches"☆90Updated 2 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- TIPS (ICLR'25): Text-Image Pretraining with Spatial Awareness☆115Updated 9 months ago
- [ECCV 2024] Improving 2D Feature Representations by 3D-Aware Fine-Tuning☆307Updated last month