huridocs/pdf_paragraphs_extraction

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/huridocs/pdf_paragraphs_extraction)

huridocs / pdf_paragraphs_extraction

☆49

Alternatives and similar repositories for pdf_paragraphs_extraction

Users that are interested in pdf_paragraphs_extraction are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

huridocs / pdf-reading-order
View on GitHub
☆16Apr 26, 2024Updated 2 years ago
locuslab / scaling_laws_data_filtering
View on GitHub
☆64Apr 9, 2024Updated 2 years ago
johnning2333 / M2Doc
View on GitHub
☆43Jun 15, 2024Updated 2 years ago
CaseDrive / publaynet-models
View on GitHub
Trained Detectron2 object detection models for document layout analysis based on PubLayNet dataset
☆28Apr 16, 2023Updated 3 years ago
RylonW / DocNLC
View on GitHub
Official code for DocNLC: A Document Image Enhancement Framework with Normalized and Latent Contrastive Representation for Multiple Degra…
☆44Mar 20, 2026Updated 4 months ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
naver-ai / trace
View on GitHub
TRACE: Table Reconstruction Aligned to Corner and Edges (ICDAR 2023)
☆32Mar 13, 2024Updated 2 years ago
Prakhar-97 / Table-detection-and-Document-layout-analysis
View on GitHub
☆10Jun 22, 2020Updated 6 years ago
huridocs / pdf-table-of-contents-extractor
View on GitHub
This project aims to extract Table of Contents (TOC) information from PDF files using the outputs generated by the pdf-document-layout-an…
☆21Feb 3, 2025Updated last year
LydiaXiaohongLi / Megatron-DeepSpeed
View on GitHub
Ongoing research training transformer language models at scale, including: BERT & GPT-2
☆19Jul 20, 2023Updated 3 years ago
liuzhuang1024 / liuzhuang1024
View on GitHub
You found a secret! lzmisscc/lzmisscc is a ✨special ✨ repository that you can use to add a README.md to your GitHub profile. Make sure it…
☆13Apr 4, 2026Updated 3 months ago
BadIdeaFactory / corporate
View on GitHub
The corporate repository where we discuss our serious business
☆22Mar 9, 2025Updated last year
poloclub / tsr-convstem
View on GitHub
High-Performance Transformers for Table Structure Recognition Need Early Convolutions
☆45Apr 21, 2026Updated 3 months ago
ZeningLin / PEneo
View on GitHub
[MM'2024] PEneo, an effective algorithm for key-value pair extraction from form-like documents, designed for real-world applications.
☆41Apr 7, 2025Updated last year
yvrjsharma / JAX
View on GitHub
☆13Jan 3, 2022Updated 4 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
JG1VPP / MuTabNet
View on GitHub
ICDAR 2024/2026 Table OCR Model
☆39Jun 16, 2026Updated last month
qurator-spk / sbb_pixelwise_segmentation
View on GitHub
Obsolete repo, merged into eynollah
☆12Sep 29, 2025Updated 10 months ago
maastrichtlawtech / case-law-explorer
View on GitHub
☁️ A network analysis software platform for analyzing Dutch and European court decisions.
☆23Mar 31, 2026Updated 3 months ago
rwightman / genalog
View on GitHub
Genalog is an open source, cross-platform python package allowing generation of synthetic document images with custom degradations and te…
☆44May 23, 2026Updated 2 months ago
yujunhuics / LayoutReader
View on GitHub
阅读顺序、Layoutreader
☆18May 8, 2025Updated last year
weiji14 / foss4g2023oceania
View on GitHub
The ecosystem of geospatial machine learning tools in the Pangeo world.
☆12Mar 17, 2025Updated last year
rusq / xls2sheets
View on GitHub
Import or partially refresh your Google Sheets from Excel files
☆18Mar 18, 2026Updated 4 months ago
pd3f / dehyphen
View on GitHub
📜 Dehyphenation of broken text (mainly German), i.e., extracted from a PDF
☆39Mar 8, 2022Updated 4 years ago
poloclub / unitable
View on GitHub
UniTable: Towards a Unified Table Foundation Model
☆534Apr 21, 2026Updated 3 months ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
derekgreene / topic-ensemble
View on GitHub
Ensemble topic modeling with matrix factorization
☆24May 10, 2018Updated 8 years ago
lazybootsafe / Audit-Learning
View on GitHub
记录自己对《代码审计》的理解和总结，对危险函数的深入分析以及在p牛的博客和代码审计圈的收获
☆10Feb 27, 2018Updated 8 years ago
wenjieguan / Log-bilinear-language-models
View on GitHub
☆18Jul 25, 2014Updated 12 years ago
GreatV / DocTrPP
View on GitHub
DocTr++ in PaddlePaddle
☆57Jul 24, 2024Updated 2 years ago
FreeOCR-AI / layoutreader
View on GitHub
A Faster LayoutReader Model based on LayoutLMv3, Sort OCR bboxes to reading order.
☆323Aug 15, 2025Updated 11 months ago
NormXU / Layout2Graph
View on GitHub
An official implementation of paper "Paragraph2Graph: A Language-independent GNN-based framework for layout analysis"
☆82Oct 14, 2023Updated 2 years ago
stefan-it / ukrainian-electra
View on GitHub
Ukrainian ELECTRA model
☆12Mar 11, 2023Updated 3 years ago
Knowledgator / utca
View on GitHub
Versatile framework designed to streamline the integration of your models, as well as those sourced from Hugging Face, into complex progr…
☆35Aug 21, 2025Updated 11 months ago
tianchiguaixia / layoutlmv3-chinese
View on GitHub
该项目是为了使用layoutlmv3针对中文图片训练和推理。其中主要解决三个问题： 1.数据标准化成可以的训练数据集格式 2.layoutlmv3-base-chinese 分词修改 2.超过512长度的文本切分和滑窗操作
☆64Sep 6, 2024Updated last year
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
nutiteq / hellomap3d-ios
View on GitHub
iOS sample apps for Nutiteq SDK
☆15Dec 23, 2016Updated 9 years ago
ajaiau0 / Django-Web-Scraping
View on GitHub
Django live Web Scarping
☆10Nov 6, 2019Updated 6 years ago
stanford-oval / ovalchat
View on GitHub
OVALChat is a customizable Web app aimed at conducting user studies with chatbots
☆28Jan 9, 2024Updated 2 years ago
DocTron-hub / OCRVerse
View on GitHub
OCRVerse: Towards Holistic OCR in End-to-End Vision-Language Models
☆30Feb 4, 2026Updated 5 months ago
skyming / langmanus
View on GitHub
☆14Mar 28, 2025Updated last year
COP26-Hackathon / Met-Office-Climate-Data-Challenge-March_2021
View on GitHub
Overview
☆11Mar 26, 2021Updated 5 years ago
Zhangxy1999 / LCEM_HSI
View on GitHub
This repository is the implementation of our paper: Local Correntropy Matrix Representation for Hyperspectral Image Classification, which…
☆10Apr 21, 2022Updated 4 years ago