lillian039 / VARCLinks
☆182Updated 2 months ago
Alternatives and similar repositories for VARC
Users that are interested in VARC are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆205Updated this week
- PyTorch implementation of NEPA☆296Updated last month
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆199Updated 9 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆233Updated 5 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 10 months ago
- PhysGame Benchmark for Physical Commonsense Evaluation in Gameplay Videos☆47Updated 6 months ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated 2 months ago
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆159Updated 2 weeks ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Updated 3 weeks ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated last week
- Official repo for UAE☆155Updated last month
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆198Updated 7 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆482Updated last month
- Visual Spatial Tuning☆169Updated 3 weeks ago
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆74Updated last week
- ☆66Updated 2 months ago
- ☆114Updated 6 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆77Updated 2 months ago
- ☆41Updated 7 months ago
- [NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆168Updated 4 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- Official implementation of Next Block Prediction: Video Generation via Semi-Autoregressive Modeling☆41Updated 11 months ago
- Visual Planning: Let's Think Only with Images☆294Updated 8 months ago
- Official PyTorch implementation of FlowMo.☆110Updated 9 months ago
- TIPS (ICLR'25): Text-Image Pretraining with Spatial Awareness☆115Updated 9 months ago
- ☆96Updated 7 months ago
- [CVPR 2025] Program synthesis for 3D spatial reasoning☆54Updated 7 months ago
- Code for MetaMorph Multimodal Understanding and Generation via Instruction Tuning☆232Updated last week