lillian039 / VARCLinks
☆186Updated 2 months ago
Alternatives and similar repositories for VARC
Users that are interested in VARC are comparing it to the libraries listed below
Sorting:
- [ICLR 2026] Uni-CoT: Towards Unified Chain-of-Thought Reasoning Across Text and Vision☆207Updated 2 weeks ago
- Code for "Scaling Language-Free Visual Representation Learning" paper (Web-SSL).☆200Updated 9 months ago
- Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens (arXiv 2025)☆240Updated 6 months ago
- PyTorch implementation of NEPA☆308Updated 2 weeks ago
- https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT☆117Updated last week
- ☆41Updated 8 months ago
- [NeurIPS 2025 Oral] Official Code for Exploring Diffusion Transformer Designs via Grafting☆70Updated last month
- [NeurIPS '25 Spotlight] Official Pytorch implementation of "Vision Transformers Don't Need Trained Registers"☆172Updated 4 months ago
- [ICLR 2025] Official implementation and benchmark evaluation repository of <PhysBench: Benchmarking and Enhancing Vision-Language Models …☆83Updated 2 weeks ago
- ☆68Updated 3 months ago
- [arXiv: 2502.05178] QLIP: Text-Aligned Visual Tokenization Unifies Auto-Regressive Multimodal Understanding and Generation☆95Updated 11 months ago
- ☆117Updated 6 months ago
- ☆97Updated 7 months ago
- Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment☆64Updated 6 months ago
- ACTIVE-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO☆78Updated 2 months ago
- Official repo for UAE☆164Updated last month
- Visual Spatial Tuning☆172Updated last week
- Holistic Evaluation of Multimodal LLMs on Spatial Intelligence☆79Updated this week
- Visual Planning: Let's Think Only with Images☆295Updated 8 months ago
- Code for ICML 2025 Paper "Highly Compressed Tokenizer Can Generate Without Training"☆201Updated 8 months ago
- Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning☆139Updated 5 months ago
- Cambrian-S: Towards Spatial Supersensing in Video☆488Updated last month
- High-Resolution Visual Reasoning via Multi-Turn Grounding-Based Reinforcement Learning☆52Updated 6 months ago
- Scaling Spatial Intelligence with Multimodal Foundation Models☆170Updated this week
- ☆58Updated 8 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆90Updated 6 months ago
- Scaling Vision Pre-Training to 4K Resolution☆221Updated last month
- [ICLR'25] Reconstructive Visual Instruction Tuning☆135Updated 10 months ago
- [CVPR 2025] Test-Time Visual In-Context Tuning☆29Updated last month
- ☆63Updated 6 months ago