Official codebase for the paper Latent Visual Reasoning
☆120Oct 22, 2025Updated 4 months ago
Alternatives and similar repositories for lvr
Users that are interested in lvr are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] This repo is the official implementation of "The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs".☆13Jan 25, 2025Updated last year
- Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”☆18Jan 27, 2026Updated last month
- [CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens☆246Aug 2, 2025Updated 7 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 7 months ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆66Dec 8, 2025Updated 3 months ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- ☆36Jan 13, 2026Updated last month
- [ICLR 2026] SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs☆44Oct 14, 2025Updated 4 months ago
- [CVPR2024] Official implementation of the paper: Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning☆40Aug 15, 2025Updated 6 months ago
- The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).☆43Feb 25, 2026Updated last week
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆77May 23, 2025Updated 9 months ago
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆91Jul 27, 2025Updated 7 months ago
- Official Repository of VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents☆99Feb 2, 2026Updated last month
- Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"☆138Feb 25, 2026Updated last week
- Preference Learning for LLaVA☆59Nov 9, 2024Updated last year
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- Pixel-Level Reasoning Model trained with RL [NeuIPS25]☆279Nov 6, 2025Updated 4 months ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆24Jan 26, 2025Updated last year
- Fine-tuned LLMs generate accurate 3D human avatars from textual descriptions using the SMPL-X model, enhancing customization and simulati…☆37Feb 5, 2025Updated last year
- ☆26Aug 4, 2020Updated 5 years ago
- ☆22Apr 6, 2021Updated 4 years ago
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆75Feb 7, 2026Updated last month
- ☆46Feb 18, 2026Updated 2 weeks ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆82Sep 19, 2025Updated 5 months ago
- [NeurIPS 2025 Spotlight] FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities☆69Dec 21, 2025Updated 2 months ago
- code for FineLIP☆38Nov 25, 2025Updated 3 months ago
- brain to speech☆10Nov 7, 2025Updated 4 months ago
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆25May 26, 2025Updated 9 months ago
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆41May 22, 2025Updated 9 months ago
- [ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆177Feb 4, 2026Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆129Apr 4, 2025Updated 11 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆78Feb 13, 2026Updated 3 weeks ago
- [NeurIPS 2024] Official implementation of InterControl☆83Feb 20, 2025Updated last year
- official code for "Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval"☆42Jul 4, 2025Updated 8 months ago
- 哈尔滨工业大学2023春季学期编译系统课程实验、习题、课件以及期末复习材料☆11Jul 30, 2023Updated 2 years ago
- [CVPR'2025] EntitySAM: Segment Everything in Video☆59Jul 13, 2025Updated 7 months ago
- [CVPR 2025] Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Att…☆64Oct 9, 2025Updated 5 months ago