yuanpinz / awesome-deep-multimodal-reasoningView external linksLinks
Collect the awesome works evolved around reasoning models like O1/R1 in visual domain
☆53Jul 21, 2025Updated 6 months ago
Alternatives and similar repositories for awesome-deep-multimodal-reasoning
Users that are interested in awesome-deep-multimodal-reasoning are comparing it to the libraries listed below
Sorting:
- Checkpoints, logs and source code for AAAI-23 paper 'Data-Efficient Image Quality Assessment with Attention-Panel Decoder'☆39Apr 3, 2024Updated last year
- [ECCV2024] Reflective Instruction Tuning: Mitigating Hallucinations in Large Vision-Language Models☆20Jul 17, 2024Updated last year
- This repository to demonstrate an application built with Java 21 + SrpingBoot 3 + MyBatis including CRUD operations, authentication, rout…☆12Dec 1, 2024Updated last year
- GroundCUA☆67Dec 24, 2025Updated last month
- Med-R1: Reinforcement Learning for Generalizable Medical Reasoning in Vision-Language Models☆104Jul 7, 2025Updated 7 months ago
- [ACM MM 2024] Improving Composed Image Retrieval via Contrastive Learning with Scaling Positives and Negatives☆39Sep 9, 2025Updated 5 months ago
- Resources and paper list for "Thinking with Images for LVLMs". This repository accompanies our survey on how LVLMs can leverage visual in…☆1,329Feb 3, 2026Updated 2 weeks ago
- ☆13Aug 28, 2024Updated last year
- KeepGPU is a simple CLI app that keeps your GPUs running.☆22Dec 9, 2025Updated 2 months ago
- Building a multi-agent RAG system with advanced RAG methods☆12Jan 12, 2025Updated last year
- YOLO格式转为COCO格式。Convert data format from YOLO format to coco format☆15Nov 1, 2023Updated 2 years ago
- A simple exam generator and grader written in Python with OpenCV☆14Jan 14, 2026Updated last month
- ☆15Nov 11, 2024Updated last year
- About The corresponding code from our paper " Making Reasoning Matter: Measuring and Improving Faithfulness of Chain-of-Thought Reasoning…☆13Jan 14, 2026Updated last month
- VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection☆22May 31, 2025Updated 8 months ago
- P^2HCT: Plug-and-Play Hierarchical C2F Transformer for Multi-Scale Feature Fusion☆19May 19, 2025Updated 8 months ago
- The official implementation of "Enhancing Representation in Radiography-Reports Foundation Model: A Granular Alignment Algorithm Using Ma…☆13Sep 13, 2024Updated last year
- Official Code Repository for the paper "Generating Realistic Images from In-the-wild Sounds", ICCV 2023☆12Aug 24, 2025Updated 5 months ago
- ☆10Aug 15, 2025Updated 6 months ago
- Surrogate Modeling of the Aerodynamic Performance for Transonic Regime☆13Feb 12, 2024Updated 2 years ago
- [Pattern Recognition, 2020] Covariance Descriptors on a Gaussian Manifold and their Application to Image Set Classification☆12May 28, 2022Updated 3 years ago
- ☆28Jan 5, 2026Updated last month
- Code for "YOLOv8-SMOT: An Efficient and Robust Framework for Real-Time Small Object Tracking via Slice-Assisted Training and Adaptive Ass…☆19Nov 5, 2025Updated 3 months ago
- Repository containing dataset, models and code associated with the CHIME project☆17Aug 22, 2024Updated last year
- Speech Security and Privacy Compendium - Mini☆10Jun 18, 2024Updated last year
- This code is for ChaLearn LAP Large-scale Continuous Gesture Recognition Challenge (Round 2) @ICCV 2017☆10Oct 21, 2017Updated 8 years ago
- Advances in recent large vision language models (LVLMs)☆15Sep 23, 2024Updated last year
- awesome-audio-visual-robustness☆11Jan 27, 2024Updated 2 years ago
- ☆10Jun 13, 2017Updated 8 years ago
- Traefik v3 plugin which allows you to pass an API-TOKEN in the header request of the targeted service. Supports whitelisted IP blocks. Lo…☆14Oct 14, 2025Updated 4 months ago
- Code for the "Long Context Needs Some R&R" paper.☆12Mar 11, 2024Updated last year
- Generate a 3D BIM Model from 2D CAD Drawings☆12Nov 23, 2022Updated 3 years ago
- A browser based CadQuery server☆12Feb 18, 2025Updated 11 months ago
- This repository includes the code to reproduce our paper "Raw Differentiable Architecture Search for Speech Deepfake and Spoofing Detecti…☆11Jul 11, 2023Updated 2 years ago
- The official github repo for MixEval-X, the first any-to-any, real-world benchmark.☆16Feb 15, 2025Updated last year
- [ICCV 2025] A Benchmark for Multi-Step Reasoning in Long Narrative Videos☆24Aug 8, 2025Updated 6 months ago
- Official implementation of Our NeurIPS 2024 Paper "Boundary Matters: A Bi-Level Active Finetuning Method"☆14Feb 11, 2025Updated last year
- CSGAN: Cyclic-Synthesized Generative Adversarial Network For Image-to-Image Transformation☆26Feb 15, 2019Updated 7 years ago
- ☆16Jun 10, 2025Updated 8 months ago