Dedoc is a library (service) for automate documents parsing and bringing to a uniform format. It automatically extracts content, logical structure, tables, and meta information from textual electronic documents. (Parse document; Document content extraction; Logical structure extraction; PDF parser; Scanned document parser; DOCX parser; HTML pars…
☆647Feb 27, 2026Updated this week
Alternatives and similar repositories for dedoc
Users that are interested in dedoc are comparing it to the libraries listed below
Sorting:
- Sydr benchmark applications☆17Jul 25, 2022Updated 3 years ago
- Library for manipulating gdb in batch mode☆21Mar 10, 2024Updated last year
- python package to parse pdfs with different parsers☆248Sep 12, 2025Updated 5 months ago
- A Docker-powered service for PDF document layout analysis. This service provides a powerful and flexible PDF analysis service. The servic…☆1,089Jan 9, 2026Updated last month
- YSC 2023 Papers: A complete collection of research papers, code and data from the International Young Scientists Conference 2023 for youn…☆12Jan 17, 2024Updated 2 years ago
- Convert any PDF into it's LaTeX source☆18May 15, 2025Updated 9 months ago
- Markdown Conversion☆371Jun 7, 2025Updated 8 months ago
- An on-premises, OCR-free unstructured data extraction, markdown conversion and benchmarking toolkit. (https://idp-leaderboard.org/)☆1,862Aug 25, 2025Updated 6 months ago
- 🇷🇺 Punctuation restoration production-ready model for Russian language 🇷🇺☆59Jul 9, 2021Updated 4 years ago
- Source code for https://t.me/science_art_at_least_once_a_week channel☆16Jun 15, 2024Updated last year
- A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.☆312Aug 15, 2025Updated 6 months ago
- ContextGem: Effortless LLM extraction from documents☆1,805Feb 22, 2026Updated last week
- The tiniest sentence encoder for Russian language☆247Jul 25, 2024Updated last year
- 🚀全新重构!论文阅读工具,一键截图AI翻译,支持数学公式,贴片截图,窗口锁定,归档管理☆138Feb 14, 2026Updated 2 weeks ago
- Effective LLM Alignment Toolkit☆152Jun 25, 2025Updated 8 months ago
- OCR, layout analysis, reading order, table recognition in 90+ languages☆19,360Feb 24, 2026Updated last week
- Multi-modal OCR pipeline optimized for ML training (text, figure, math, tables, diagrams)☆682May 20, 2025Updated 9 months ago
- coze api to openai☆15Sep 1, 2024Updated last year
- This is an unofficial ITMO beamer template made by me. Please, feel free to use it and contribute.☆15Oct 10, 2023Updated 2 years ago
- Boost your efficiency with Fish Speech Batch Inference. Easily process multiple texts and achieve consistently great results. 🗨️🐟☆25Aug 4, 2025Updated 7 months ago
- Module for processing reanalysis grids and comparative analysis of time series with meteorological parameters☆21Feb 13, 2021Updated 5 years ago
- [ICLR 2026] SR-Scientist: Scientific Equation Discovery With Agentic AI☆31Jan 27, 2026Updated last month
- A Comprehensive Toolkit for High-Quality PDF Content Extraction☆9,402Jan 3, 2025Updated last year
- Augmentex — a library for augmenting texts with errors☆69Jul 3, 2024Updated last year
- ☆204Updated this week
- RuBLiMP: Russian Benchmark of Linguistic Minimal Pairs☆19Feb 8, 2026Updated 3 weeks ago
- SAGE: Spelling correction, corruption and evaluation for multiple languages☆165Dec 8, 2025Updated 2 months ago
- ExtractThinker is a Document Intelligence library for LLMs, offering ORM-style interaction for flexible and powerful document workflows.☆1,485Aug 27, 2025Updated 6 months ago
- Homelab/SOHO Certificate Authority with age encryption and deployment☆123Jan 19, 2026Updated last month
- Juliet C/C++ Dynamic Test Suite☆35Apr 18, 2023Updated 2 years ago
- Toolkit for linearizing PDFs for LLM datasets/training☆16,947Feb 19, 2026Updated last week
- A lightweight LMM-based Document Parsing Model☆6,511Updated this week
- Yet another common Python wrapper for Alice and Salut skills and bots in Telegram, VK, and Facebook☆28Mar 16, 2023Updated 2 years ago
- Simple package to extract text with coordinates from programmatic PDFs☆245Feb 25, 2026Updated last week
- A sleek dataset viewer built entirely by AI Agent. Supports streaming large files from WebDAV, S3, SSH, Local or Hugging Face.☆624Feb 20, 2026Updated last week
- Schema-Guided Reasoning (SGR) has agentic system design created by neuraldeep community☆1,016Feb 21, 2026Updated last week
- Repository for initial POC NLP based SQL adapter using LLM.☆10May 6, 2025Updated 9 months ago
- T5-based (russian) text normalization☆25Jan 25, 2024Updated 2 years ago
- Transforms complex documents like PDFs into LLM-ready markdown/JSON for your Agentic workflows.☆55,275Updated this week