mahtabbigverdi/Aurora-perception

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/mahtabbigverdi/Aurora-perception)

mahtabbigverdi / Aurora-perception

☆50

Alternatives and similar repositories for Aurora-perception

Users that are interested in Aurora-perception are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

arijitray1993 / SAT
View on GitHub
Spatial Aptitude Training for Multimodal Langauge Models
☆33Feb 8, 2026Updated 5 months ago
STARE-bench / STARE
View on GitHub
☆19Oct 12, 2025Updated 9 months ago
UMass-Embodied-AGI / Mirage
View on GitHub
[CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens
☆294Aug 2, 2025Updated 11 months ago
VincentLeebang / lvr
View on GitHub
Official codebase for the paper Latent Visual Reasoning
☆171Oct 22, 2025Updated 9 months ago
KAIST-Visual-AI-Group / Token-Warping-MLLM
View on GitHub
☆23Mar 31, 2026Updated 3 months ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
McGill-NLP / latentlens
View on GitHub
Code and data for the paper "LatentLens: Revealing Highly Interpretable Visual Tokens in LLMs"
☆48Mar 31, 2026Updated 3 months ago
FYYDCC / IVT-LR
View on GitHub
Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”
☆18Jan 27, 2026Updated 6 months ago
PlusLabNLP / VISCO
View on GitHub
[CVPR 2025] VISCO: Benchmarking Fine-Grained Critique and Correction Towards Self-Improvement in Visual Reasoning
☆13Jun 7, 2025Updated last year
tmtuan1307 / NAYER
View on GitHub
[CVPR-2024] NAYER: Noisy Layer Data Generation for Efficient and Effective Data-free Knowledge Distillation
☆16Oct 19, 2024Updated last year
WPR001 / Ego-ST
View on GitHub
☆16Sep 25, 2025Updated 10 months ago
gogoczh / CoMT
View on GitHub
code for "CoMT: A Novel Benchmark for Chain of Multi-modal Thought on Large Vision-Language Models"
☆19Mar 10, 2025Updated last year
fereenwong / cdViews
View on GitHub
official code for "3D Question Answering via only 2D Vision-Language Models"
☆24Mar 4, 2026Updated 4 months ago
zmzhang2000 / MMMC
View on GitHub
Official repository for Robust Multimodal Large Language Models Against Modality Conflict
☆22Jul 9, 2025Updated last year
LaVi-Lab / VG-LLM
View on GitHub
The code for paper 'Learning from Videos for 3D World: Enhancing MLLMs with 3D Vision Geometry Priors'
☆248Nov 28, 2025Updated 8 months ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
Ugness / ReDi
View on GitHub
Official implementation of ReDi: Rectified Discrete Flow (NeurIPS 2025)
☆18May 11, 2026Updated 2 months ago
kaist-cvml / geometric-distillation
View on GitHub
[EMNLP 2025 Findings] 3D-Aware Vision-Language Models Fine-Tuning with Geometric Distillation
☆39Jun 12, 2025Updated last year
Wakals / CoVT
View on GitHub
[ECCV 2026] Official repo of "Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens"
☆379Apr 17, 2026Updated 3 months ago
KAIST-Visual-AI-Group / APC-VLM
View on GitHub
[ICCV 2025] Official code for Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
☆66Sep 12, 2025Updated 10 months ago
Cogito2012 / OpenMixer
View on GitHub
[WACV 2025] Exploiting VLM Localizability and Semantics for Open Vocabulary Action Detection
☆17Mar 23, 2025Updated last year
mll-lab-nu / Theory-of-Space
View on GitHub
THEORY OF SPACE: a benchmark for evaluating whether foundation models can actively explore under partial observability efficiently to bui…
☆85Feb 27, 2026Updated 5 months ago
VITA-Group / VLM-3R
View on GitHub
[CVPR 2026] VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction
☆431Jul 15, 2026Updated 2 weeks ago
parameterlab / apricot
View on GitHub
Source code of "Calibrating Large Language Models Using Their Generations Only", ACL2024
☆22Nov 20, 2024Updated last year
AntResearchNLP / ViLaSR
View on GitHub
[NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing
☆98Jul 27, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
DreamMr / HR-Bench
View on GitHub
PyTorch Implementation of "Divide, Conquer and Combine: A Training-Free Framework for High-Resolution Image Perception in Multimodal Larg…
☆51Mar 2, 2026Updated 4 months ago
multimodal-reasoning-lab / Bagel-Zebra-CoT
View on GitHub
https://huggingface.co/datasets/multimodal-reasoning-lab/Zebra-CoT
☆137Jan 30, 2026Updated 5 months ago
cheolhong0916 / contrastive-probing
View on GitHub
☆16Jun 19, 2026Updated last month
penghao-wu / visual_jigsaw
View on GitHub
☆78Apr 9, 2026Updated 3 months ago
kdariina / CLIP-not-BoW-unimodally
View on GitHub
Code for "CLIP Behaves like a Bag-of-Words Model Cross-modally but not Uni-modally"
☆29Feb 27, 2026Updated 5 months ago
hany01rye / tiger
View on GitHub
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
☆23Nov 18, 2025Updated 8 months ago
Haochen-Wang409 / ross
View on GitHub
[ICLR'25] Reconstructive Visual Instruction Tuning
☆135Apr 9, 2025Updated last year
NOVAglow646 / Monet
View on GitHub
[CVPR 2026] Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"
☆215Mar 19, 2026Updated 4 months ago
ZJU-REAL / ViewSpatial-Bench
View on GitHub
[ECCV 2026] ViewSpatial-Bench:Evaluating Multi-perspective Spatial Localization in Vision-Language Models
☆82Mar 9, 2026Updated 4 months ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
Nix07 / belief_tracking
View on GitHub
This repository contains the code used for the experiments in the paper "Language Models use Lookbacks to Track Beliefs".
☆16Mar 14, 2026Updated 4 months ago
allenai / SAGE
View on GitHub
[arXiv 2025] SAGE: Training Smart Any-Horizon Agents for Long Video Reasoning with Reinforcement Learning
☆70Dec 17, 2025Updated 7 months ago
jsikyoon / OCRL
View on GitHub
Object-Centric-Representation Library (OCRL): This repo is to explore OCR on various downstream tasks from supervised learning tasks to R…
☆12Feb 23, 2024Updated 2 years ago
KAIST-Visual-AI-Group / PairFlow
View on GitHub
[ICLR 2026] Official code for PairFlow: Closed-Form Source-Target Coupling for Few-Step Generation in Discrete Flow Models
☆17Jul 3, 2026Updated 3 weeks ago
yayafengzi / ALToLLM
View on GitHub
ALTo: Adaptive-Length Tokenizer for Autoregressive Mask Generation
☆30May 27, 2025Updated last year
anthonysimeonov / rpdiff
View on GitHub
☆62Jan 15, 2024Updated 2 years ago
KAIST-Visual-AI-Group / MatLat
View on GitHub
[CVPR 2026 Highlight] Official code for MatLat: Material Latent Space for PBR Texture Generation
☆18Jul 16, 2026Updated last week