Hxyz-123 / ReasoningOCRView external linksLinks
☆17Jul 24, 2025Updated 6 months ago
Alternatives and similar repositories for ReasoningOCR
Users that are interested in ReasoningOCR are comparing it to the libraries listed below
Sorting:
- [arXiv: 2505.12307] LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?☆34Dec 1, 2025Updated 2 months ago
- The official repo for the technical report "Scalable Mask Annotation for Video Text Spotting"☆16May 3, 2023Updated 2 years ago
- [NeurIPS'24] GoMatching: A Simple Baseline for Video Text Spotting via Long and Short Term Matching☆28May 29, 2025Updated 8 months ago
- [ISBI 2025] XLSTM-HVED: Cross-Modal Brain Tumor Segmentation and MRI Reconstruction Method Using Vision XLSTM and Heteromodal Variational…☆17Jul 9, 2025Updated 7 months ago
- [ICLR 2026] OCR-Reasoning Benchmark: Unveiling the True Capabilities of MLLMs in Complex Text-Rich Image Reasoning☆72Dec 17, 2025Updated last month
- Official implementation of UPOCR: Towards unified pixel-level OCR interface (ICML 2024)☆66Jun 6, 2024Updated last year
- This is the pytorch implementation of FCL-Net, accepted by NN'2022.☆14May 25, 2022Updated 3 years ago
- Official repo for "AnesSuite: A Comprehensive Benchmark and Dataset Suite for Anesthesiology Reasoning in LLMs"☆22Jan 18, 2026Updated 3 weeks ago
- Unified Instance and Knowledge Alignment Pretraining for Aspect-based Sentiment Analysis☆17Mar 27, 2023Updated 2 years ago
- [EMNLP22] Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models☆22Mar 27, 2023Updated 2 years ago
- ☆31Apr 8, 2025Updated 10 months ago
- ☆21Dec 12, 2025Updated 2 months ago
- A Survey of Multimodal Retrieval-Augmented Generation☆20Nov 3, 2025Updated 3 months ago
- ☆22May 30, 2023Updated 2 years ago
- Source code of COLING 2022 paper "A Contrastive Cross-channel Data Augmentation Framework for Aspect-based Sentiment Analysis"☆22Feb 18, 2023Updated 2 years ago
- ☆27Nov 29, 2023Updated 2 years ago
- The official codes and datasets for Artistic Text Segmentation (ECCV 2024).☆28Sep 24, 2025Updated 4 months ago
- [ACL 2025 main] The official GitHub page of "Reviving Cultural Heritage: A Novel Approach for Comprehensive Historical Document Restorati…☆53Dec 22, 2025Updated last month
- ☆17Sep 23, 2025Updated 4 months ago
- Official Implementation of TFLOP: Table Structure Recognition Framework with Layout Pointer Mechanism☆47Aug 25, 2025Updated 5 months ago
- Official implementation of USR (NeurIPS 2024)☆39Dec 21, 2024Updated last year
- ☆27Updated this week
- [TIP2025] The implementation of "Uncertainty Guided Refinement for Fine-grained Salient Object Detection"☆15Apr 20, 2025Updated 9 months ago
- [IJCAI-2024] The official code of Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition☆10Aug 10, 2025Updated 6 months ago
- This is the official repository for the paper "Modeling Human Gaze Behavior with Diffusion Models for Unified Scanpath Prediction". ICCV …☆23Dec 4, 2025Updated 2 months ago
- A benchmark dataset designed to support the development and evaluation of large language models (LLMs) for conversational mental health a…☆17Feb 24, 2025Updated 11 months ago
- ☆11Sep 25, 2022Updated 3 years ago
- A semi print-in-place hand for human-like manipulation, designed to be built by anyone.☆17Jan 5, 2026Updated last month
- Code repository supporting the paper "Auto-Generating Weak Labels for Real & Synthetic Data to Improve Label-Scarce Medical Image Segment…☆11Apr 29, 2024Updated last year
- The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++:…☆282May 30, 2025Updated 8 months ago
- The code for On Robust Cross-View Consistency in Outdoor Self-Supervised Monocular Depth Estimation☆13Jun 2, 2023Updated 2 years ago
- ☆11Aug 29, 2025Updated 5 months ago
- [NeurIPS 25]SwitchLingua: The First Large-Scale Multilingual and Multi-Ethnic Code-Switching Dataset☆16Sep 19, 2025Updated 4 months ago
- High-performance ASR tool using Faster Whisper, supporting custom models, multi-language transcription, and real-time processing feedback…☆10Sep 17, 2025Updated 4 months ago
- (WWW'25 + Netflix) The first CRS that retrieves collaborative filtering knowledge with two-step context-aware reflection.☆18Sep 10, 2025Updated 5 months ago
- Open-source reproducible benchmarks from Argmax☆77Jan 19, 2026Updated 3 weeks ago
- Audio-Visual Perception of Omnidirectional Video for Virtual Reality Applications☆15Feb 22, 2023Updated 2 years ago
- Curated LLM (ICML 2024)☆14Oct 23, 2024Updated last year
- Core ML Demos is an experimental Core ML app. It visualizes the inference results of ML models and can be used to benchmark ML models and…☆12Jan 8, 2026Updated last month