ispras/dedoc

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ispras/dedoc)

ispras / dedoc

Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML pars…

☆715

Alternatives and similar repositories for dedoc

Users that are interested in dedoc are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

ispras / sydr-benchmark
View on GitHub
Sydr benchmark applications
☆17Jul 25, 2022Updated 4 years ago
huridocs / pdf-document-layout-analysis
View on GitHub
A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…
☆1,273Jul 13, 2026Updated last week
NanoNets / docext
View on GitHub
An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)
☆2,032Mar 17, 2026Updated 4 months ago
chatclimate-ai / ParseStudio
View on GitHub
python package to parse pdfs with different parsers
☆269Sep 12, 2025Updated 10 months ago
Travvy88 / DocumentGenerator_DoGe
View on GitHub
Synthetic Document Generator for Document AI. Creates document images annotated with text and bounding boxes of each word. Images contain…
☆33Jul 23, 2025Updated last year
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
shcherbak-ai / contextgem
View on GitHub
ContextGem: Effortless LLM extraction from documents
☆1,863Jun 6, 2026Updated last month
Ruiyang-061X / Awesome-Search-RL
View on GitHub
☆44Jun 10, 2025Updated last year
itmo-ai / YSC-2023-Papers
View on GitHub
YSC 2023 Papers: A complete collection of research papers, code and data from the International Young Scientists Conference 2023 for youn…
☆12Jan 17, 2024Updated 2 years ago
chatdoc-com / OCRFlux
View on GitHub
OCRFlux is a lightweight yet powerful multimodal toolkit that significantly advances PDF-to-Markdown conversion, excelling in complex lay…
☆2,523Apr 14, 2026Updated 3 months ago
Yuliang-Liu / MonkeyOCR
View on GitHub
A lightweight LMM-based Document Parsing Model
☆6,607Updated this week
MarkPDFdown / markpdfdown
View on GitHub
A high-quality PDF to Markdown tool based on large language model visual recognition. 一款基于大模型视觉识别的高质量PDF转Markdown工具
☆1,930Jan 25, 2026Updated 6 months ago
ysm-dev / cpdown
View on GitHub
📥 cpdown - Copy to clipboard any webpage content/youtube subtitle as clean markdown with one click or shortcut
☆563Updated this week
xxnuo / serverless-markdown-convertor
View on GitHub
Markdown Conversion
☆373Jun 7, 2025Updated last year
kyryl-opens-ml / no-ocr
View on GitHub
https://no-ocr.com/about
☆182Jun 30, 2025Updated last year
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
oomol-lab / pdf-craft
View on GitHub
PDF craft can convert PDF files into various other formats. This project will focus on processing PDF files of scanned books.
☆6,024Jun 27, 2026Updated 3 weeks ago
landing-ai / agentic-doc
View on GitHub
Legacy Python library for Agentic Document Extraction (ADE). Use the landingai-ade library for all new projects.
☆2,395Mar 24, 2026Updated 4 months ago
pavviaz / DeepScriptum
View on GitHub
Convert any PDF into it's LaTeX source
☆18May 15, 2025Updated last year
SibNN / asr_eval
View on GitHub
Evaluation tools for Automatic Speech Recognition (ASR), model and dataset collection
☆31Mar 9, 2026Updated 4 months ago
ChrisLisbon / TorchCNNBuilder
View on GitHub
Framework for the automatic creation of CNN architectures
☆37Nov 21, 2025Updated 8 months ago
datalab-to / surya
View on GitHub
OCR, layout analysis, reading order, table recognition in 90+ languages
☆21,149Updated this week
murtaza-nasir / speakr
View on GitHub
Speakr is a personal, self-hosted web application designed for transcribing audio recordings
☆3,534Jul 15, 2026Updated last week
johnson7788 / MultiAgentPPT
View on GitHub
MultiAgentPPT 是一个集成了 A2A（Agent2Agent）+ MCP（Model Context Protocol）+ ADK（Agent Development Kit）架构的智能化演示文稿生成系统，支持通过多智能体协作和流式并发机制
☆1,621Jul 16, 2026Updated last week
stardustai / dataset-viewer
View on GitHub
A sleek dataset viewer built entirely by AI Agent. Supports streaming large files from WebDAV, S3, SSH, Local or Hugging Face.
☆941Mar 28, 2026Updated 3 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
guy-hartstein / company-research-agent
View on GitHub
An agentic company research tool powered by LangGraph and Tavily that conducts deep diligence on companies using a multi-agent framework.…
☆2,181Updated this week
murtaza-nasir / pdf3md
View on GitHub
A modern, user-friendly web application that converts PDF documents to clean, formatted Markdown text.
☆401Jun 16, 2025Updated last year
raphael-seo / Versatile-OCR-Program
View on GitHub
Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)
☆677May 13, 2026Updated 2 months ago
allenai / olmocr
View on GitHub
Toolkit for linearizing PDFs for LLM datasets/training
☆19,182Mar 25, 2026Updated 4 months ago
jrzkaminski / ITMO-beamer
View on GitHub
This is an unofficial ITMO beamer template made by me. Please, feel free to use it and contribute.
☆15Oct 10, 2023Updated 2 years ago
robert-mcdermott / ai-knowledge-graph
View on GitHub
AI Powered Knowledge Graph Generator
☆2,551Dec 28, 2025Updated 6 months ago
plait-board / drawnix
View on GitHub
开源白板工具（SaaS），一体化白板，包含思维导图、流程图、自由画等。All in one open-source whiteboard tool with mind, flowchart, freehand and etc.
☆14,331Jul 17, 2026Updated last week
panyanyany / Twocast
View on GitHub
AI Podcast Generator for bilingual episodes, Multi Languages, Alternative to NotebookLLM；真人对话AI播客生成器，多语言，多音色
☆1,248Jul 1, 2025Updated last year
andreygetmanov / science_art_at_least_once_a_week
View on GitHub
Source code for https://t.me/science_art_at_least_once_a_week channel
☆16Jun 15, 2024Updated 2 years ago
Proton VPN Special Offer - Get 70% off • Ad
Special partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
bytedance / Dolphin
View on GitHub
The official repo for “Dolphin: Document Image Parsing via Heterogeneous Anchor Prompting”, ACL, 2025.
☆9,039Mar 25, 2026Updated 4 months ago
vamplabAI / sgr-agent-core
View on GitHub
Schema-Guided Reasoning (SGR) has agentic system design created by neuraldeep community
☆1,114Jul 16, 2026Updated last week
funstory-ai / BabelDOC
View on GitHub
Yet Another Document Translator
☆8,996Jul 16, 2026Updated last week
yobix-ai / extractous
View on GitHub
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
☆1,768Dec 21, 2024Updated last year
alibaba / Logics-Parsing
View on GitHub
☆1,394May 13, 2026Updated 2 months ago
davialabs / davia-app-builder-py
View on GitHub
The easiest way to build apps from your Python code
☆574Oct 17, 2025Updated 9 months ago
opendatalab / PDF-Extract-Kit
View on GitHub
A Comprehensive Toolkit for High-Quality PDF Content Extraction
☆9,806Jan 3, 2025Updated last year