☆46Feb 18, 2026Updated last month
Alternatives and similar repositories for Aurora-perception
Users that are interested in Aurora-perception are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Spatial Aptitude Training for Multimodal Langauge Models☆26Feb 8, 2026Updated last month
- [CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning☆13Jun 7, 2025Updated 9 months ago
- ☆16Sep 25, 2025Updated 6 months ago
- code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"☆19Mar 10, 2025Updated last year
- ☆18Aug 7, 2025Updated 7 months ago
- Wordpress hosting with auto-scaling on Cloudways • AdFully Managed hosting built for WordPress-powered businesses that need reliable, auto-scalable hosting. Cloudways SafeUpdates now available.
- PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…☆43Mar 2, 2026Updated 3 weeks ago
- ☆12Dec 6, 2024Updated last year
- OmniStream: Mastering Perception, Reconstruction and Action in Continuous Streams☆55Mar 15, 2026Updated 2 weeks ago
- Stress-Testing Image Generation Models with Explainable Human Evaluation on Open-ended Real-World Tasks [ICLR 2026]☆28Mar 13, 2026Updated 2 weeks ago
- [ICLR 2025 Oral] Official Implementation for "Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Un…☆21Oct 24, 2024Updated last year
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆47Jul 22, 2025Updated 8 months ago
- [ICLR'25] Reconstructive Visual Instruction Tuning☆135Apr 9, 2025Updated 11 months ago
- [ECCV 2024] HandDGP: Camera-Space Hand Mesh Prediction with Differentiable Global Positioning☆40Feb 12, 2025Updated last year
- HandsOnVLM: Vision-Language Models for Hand-Object Interaction Prediction☆41Sep 15, 2025Updated 6 months ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Official PyTorch implementation of `[ACMMM 2023]Relational Contrastive Learning for Scene Text Recognition`☆17Sep 22, 2023Updated 2 years ago
- 南京大学小百合BBS部分数据归档(截至2020年7月初),来源网址:http://bbs.nju.edu.cn/☆17Nov 2, 2020Updated 5 years ago
- [CVPR 2025] Program synthesis for 3D spatial reasoning☆59Jun 16, 2025Updated 9 months ago
- This is a collection of awesome papers I have read (carefully or roughly) in the fields of computer vision, machine learning, pattern rec…☆25Aug 8, 2024Updated last year
- ☆19Oct 28, 2025Updated 5 months ago
- 🚀 LLM-I: Transform LLMs into natural interleaved multimodal creators! ✨ Tool-use framework supporting image search, generation, code ex…☆41Oct 20, 2025Updated 5 months ago
- ☆24May 23, 2025Updated 10 months ago
- Awesome Vision-Language Compositionality, a comprehensive curation of research papers in literature.☆39Feb 13, 2025Updated last year
- VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model☆15Jul 31, 2025Updated 7 months ago
- Bare Metal GPUs on DigitalOcean Gradient AI • AdPurpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
- A simple and flexible PyTorch implementation of Video StableDiffusion (ZeroScope_v2) based on diffusers.☆20Feb 15, 2024Updated 2 years ago
- ☆16Oct 12, 2025Updated 5 months ago
- A Holistic Embodied Cognition Benchmark☆19Apr 3, 2025Updated 11 months ago
- A curated list of awesome resources on AI Scientists based on our survey "A Comprehensive Survey of AI Scientists".☆29Dec 18, 2025Updated 3 months ago
- [ICCV 2025] The official pytorch implement of "LLaVA-SP: Enhancing Visual Representation with Visual Spatial Tokens for MLLMs".☆22Oct 28, 2025Updated 5 months ago
- ☆24Feb 6, 2026Updated last month
- Code implementation of the paper 'FIction: 4D Future Interaction Prediction from Video'☆18Mar 19, 2025Updated last year
- ☆33Feb 7, 2026Updated last month
- ☆47Dec 30, 2024Updated last year
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- An in-the-wild benchmark for AI agents in the OpenClaw Environment.☆147Updated this week
- A testbed for agents and environments that can automatically improve models through data generation.☆28Mar 4, 2025Updated last year
- Official codebase for the paper Latent Visual Reasoning☆132Oct 22, 2025Updated 5 months ago
- Repository for the paper "Data Efficient Masked Language Modeling for Vision and Language".☆18Sep 17, 2021Updated 4 years ago
- [ECCV 2024 Oral] Official implementation of the paper "DEVIAS: Learning Disentangled Video Representations of Action and Scene"☆27Nov 15, 2025Updated 4 months ago
- STI-Bench : Are MLLMs Ready for Precise Spatial-Temporal World Understanding?☆38Jan 12, 2026Updated 2 months ago
- 深度学习与围棋学习☆16Oct 27, 2021Updated 4 years ago