Official codebase for the paper Latent Visual Reasoning
☆120Oct 22, 2025Updated 4 months ago
Alternatives and similar repositories for lvr
Users that are interested in lvr are comparing it to the libraries listed below
Sorting:
- [ICLR 2025] This repo is the official implementation of "The Labyrinth of Links: Navigating the Associative Maze of Multi-modal LLMs".☆13Jan 25, 2025Updated last year
- Official repository for “Reasoning in the Dark: Interleaved Vision-Text Reasoning in Latent Space”☆18Jan 27, 2026Updated last month
- [CVPR 2026] Machine Mental Imagery: Empower Multimodal Reasoning with Latent Visual Tokens☆246Aug 2, 2025Updated 7 months ago
- Code for "CAFe: Unifying Representation and Generation with Contrastive-Autoregressive Finetuning"☆32Mar 26, 2025Updated 11 months ago
- Bidirectional Likelihood Estimation with Multi-Modal Large Language Models for Text-Video Retrieval (ICCV 2025 Highlight)☆20Aug 1, 2025Updated 7 months ago
- [AAAI 2026 Oral] The official code of "UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning"☆66Dec 8, 2025Updated 3 months ago
- Official pytorch implementation of "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language…☆14Dec 16, 2024Updated last year
- ☆36Jan 13, 2026Updated last month
- ☆47Jan 26, 2026Updated last month
- ☆54Feb 9, 2026Updated last month
- [CVPR2024] Official implementation of the paper: Skeleton-in-Context: Unified Skeleton Sequence Modeling with In-Context Learning☆40Aug 15, 2025Updated 6 months ago
- The code implementation for UME-R1: Exploring Reasoning-Driven Generative Multimodal Embeddings (ICLR 2026).☆43Feb 25, 2026Updated last week
- [ICLR 2026] SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs☆44Oct 14, 2025Updated 4 months ago
- LLaVE: Large Language and Vision Embedding Models with Hardness-Weighted Contrastive Learning☆77May 23, 2025Updated 9 months ago
- Awesome-Parallel-Reasoning: Unlocking the reasoning potential of LLMs. Papers, Code, Resources & Survey.☆48Updated this week
- Official implementation of "MMNeuron: Discovering Neuron-Level Domain-Specific Interpretation in Multimodal Large Language Model". Our co…☆25Dec 20, 2024Updated last year
- [NeurIPS 2025] Reinforcing Spatial Reasoning in Vision-Language Models with Interwoven Thinking and Visual Drawing☆91Jul 27, 2025Updated 7 months ago
- Official codes of "Monet: Reasoning in Latent Visual Space Beyond Image and Language"☆138Feb 25, 2026Updated last week
- Learning Situation Hyper-Graphs for Video Question Answering☆22Feb 16, 2024Updated 2 years ago
- Official Implementation (Pytorch) of the "VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Capti…☆24Jan 26, 2025Updated last year
- Fine-tuned LLMs generate accurate 3D human avatars from textual descriptions using the SMPL-X model, enhancing customization and simulati…☆37Feb 5, 2025Updated last year
- ☆46Feb 18, 2026Updated 2 weeks ago
- a unified reinforcement learning toolbox for joint RL on language models and diffusion models☆75Feb 7, 2026Updated last month
- ☆22Apr 6, 2021Updated 4 years ago
- ☆26Aug 4, 2020Updated 5 years ago
- [NeurIPS 2025 Spotlight] Think or Not Think: A Study of Explicit Thinking in Rule-Based Visual Reinforcement Fine-Tuning☆82Sep 19, 2025Updated 5 months ago
- Official Implementation for the paper "Integrative Decoding: Improving Factuality via Implicit Self-consistency"☆32Apr 12, 2025Updated 10 months ago
- [NeurIPS 2025 Spotlight] FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities☆69Dec 21, 2025Updated 2 months ago
- [ICLR'25] Official code for the paper 'MLLMs Know Where to Look: Training-free Perception of Small Visual Details with Multimodal LLMs'☆349Apr 20, 2025Updated 10 months ago
- Bidirectional Mapping between Action Physical-Semantic Space☆34Sep 7, 2025Updated 6 months ago
- [ACL 2025] The official pytorch implement of "MIND: A Multi-agent Framework for Zero-shot Harmful Meme Detection".☆25May 26, 2025Updated 9 months ago
- A library of visualization tools for the interpretability and hallucination analysis of large vision-language models (LVLMs).☆41May 22, 2025Updated 9 months ago
- [ICLR 2026] An official implementation of "SIM-CoT: Supervised Implicit Chain-of-Thought"☆177Feb 4, 2026Updated last month
- [ACL 2024 Findings] "TempCompass: Do Video LLMs Really Understand Videos?", Yuanxin Liu, Shicheng Li, Yi Liu, Yuxiang Wang, Shuhuai Ren, …☆129Apr 4, 2025Updated 11 months ago
- Codes for ReFocus: Visual Editing as a Chain of Thought for Structured Image Understanding [ICML 2025]]☆45Jul 22, 2025Updated 7 months ago
- Official repository for "Visual Generation Unlocks Human-Like Reasoning through Multimodal World Models", https://arxiv.org/abs/2601.1983…☆78Feb 13, 2026Updated 3 weeks ago
- [NeurIPS 2024] Official implementation of InterControl☆83Feb 20, 2025Updated last year
- Official repo for UAE☆170Dec 29, 2025Updated 2 months ago
- 哈尔滨工业大学2023春季学期编译系统课程实验、习题、课件以及期末复习材料☆11Jul 30, 2023Updated 2 years ago