lillian039 / VARCLinks
☆186Updated 2 months ago
Alternatives and similar repositories for VARC
Users that are interested in VARC are comparing it to the libraries listed below
Sorting:
- Official repo for UAE☆164Updated last month
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Updated last month
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆201Updated 9 months ago
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆207Updated 2 weeks ago
- [NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆172Updated 4 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆240Updated 6 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- PyTorch implementation of NEPA☆308Updated 2 weeks ago
- Visual Planning: Let's Think Only with Images☆295Updated 8 months ago
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆201Updated 8 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 3 weeks ago
- Cambrian-S: Towards Spatial Supersensing in Video☆488Updated last month
- Visual Spatial Tuning☆172Updated last week
- ☆97Updated 7 months ago
- Official Implementation of pMF https://arxiv.org/abs/2601.22158☆91Updated this week
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- ☆41Updated 8 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- ☆117Updated 6 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated this week
- Scaling Vision Pre-Training to 4K Resolution☆221Updated last month
- ☆68Updated 3 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- [CVPR 2025] Program synthesis for 3D spatial reasoning☆56Updated 7 months ago
- Thinking with Videos from Open-Source Priors. We reproduce chain-of-frames visual reasoning by fine-tuning open-source video models. Give…☆207Updated 3 months ago
- ☆163Updated last year
- [ICML 2024] Compositional Image Decomposition with Diffusion Models☆53Updated last year
- A collection of vision foundation models unifying understanding and generation.☆59Updated last year
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago