ChrizH/pdfstructure

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ChrizH/pdfstructure)

ChrizH / pdfstructure

`pdfstructure` detects, splits and organizes the documents text content into its natural structure as envisioned by the author.

☆106

Alternatives and similar repositories for pdfstructure

Users that are interested in pdfstructure are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

DS3Lab / DocParser
View on GitHub
☆83Apr 12, 2022Updated 4 years ago
JustlyAI / lmss_entity_extractor
View on GitHub
Tool to apply Legal Matter Specification Standard (LMSS) to documents
☆12Aug 15, 2024Updated last year
davidissamattos / bpcs
View on GitHub
bpcs - Bayesian Paired Comparison in Stan
☆12Mar 14, 2024Updated 2 years ago
heraclex12 / Viwiki-spelling
View on GitHub
A dataset for Vietnamese Spelling Correction
☆17Sep 27, 2021Updated 4 years ago
alea-institute / kl3m-data
View on GitHub
KL3M training data collection and preprocessing
☆22Apr 14, 2025Updated last year
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
jstockwin / py-pdf-parser
View on GitHub
A Python tool to help extracting information from structured PDFs.
☆427May 25, 2026Updated last month
songys / single_turn_dialogue
View on GitHub
사전에서 대화 예문만 추출한 데이터
☆16Apr 24, 2023Updated 3 years ago
Alignment-Lab-AI / datagen
View on GitHub
a pipeline for using api calls to agnostically convert unstructured data into structured training data
☆32Sep 22, 2024Updated last year
kkdai / pdf_online_editor
View on GitHub
A simple web application built with Streamlit that allows users to upload a PDF file and display its pages as images. Users can select a …
☆15Jan 4, 2024Updated 2 years ago
tweedmann / 3x8emotions
View on GitHub
Code and models for 3 different tools to measure appeals to 8 discrete emotions in German political text
☆16Jun 29, 2022Updated 4 years ago
stefan-it / europeana-bert
View on GitHub
BERT and ELECTRA models trained on Europeana Newspapers
☆39Dec 14, 2021Updated 4 years ago
davidycliao / legisCrawler
View on GitHub
An automation webcrawler based on Selenium library for retrieving parliamentary questions on The Website of Taiwan Legislative Yuan (http…
☆11Jun 8, 2023Updated 3 years ago
mlr-org / bbotk
View on GitHub
Black-box optimization framework for R.
☆26Updated this week
text-analytics-20 / news-sentiment-development
View on GitHub
Analyzing the sentiment development of news articles with the topic "migration" over time.
☆12May 25, 2022Updated 4 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
pablobarbera / POIR613
View on GitHub
Course materials: POIR 613 - Computational Social Science - USC Fall 2022
☆20Nov 1, 2022Updated 3 years ago
VRP-REP / translator
View on GitHub
Java tool to translate VRP instances to VRP-REP unified format.
☆11Nov 28, 2014Updated 11 years ago
IPPSR / CongressData
View on GitHub
A Tool for the Congress Data dataset
☆26Dec 8, 2025Updated 6 months ago
factset / enterprise-sdk-utils-python
View on GitHub
Utilities that support FactSet's SDK in Python
☆13Updated this week
edrubin / EC524W22
View on GitHub
Masters-level applied econometrics course—focusing on prediction—at the University of Oregon (EC424/524 during Winter quarter, 2022) Taug…
☆19Mar 15, 2022Updated 4 years ago
houseofcommonslibrary / clcharts
View on GitHub
Themes, colors and tools for making charts with ggplot2 in the House of Commons Library style
☆22Jun 4, 2026Updated 3 weeks ago
UBIAI / layout_lm_tutorial
View on GitHub
☆15Jun 16, 2021Updated 5 years ago
PyAntony / hate-speech
View on GitHub
Bert language model for hate speech detection.
☆21Aug 6, 2020Updated 5 years ago
jsavelka / sbd_adjudicatory_dec
View on GitHub
☆20Jun 11, 2021Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
qnixsynapse / rich-chat
View on GitHub
Python console application designed to provide an engaging and visually appealing LLM chat experience on Unix-like consoles or Terminals.
☆25May 20, 2026Updated last month
pstutz / syncodia
View on GitHub
Bridging Large Language Models with Scala 3 Functions
☆11Aug 31, 2024Updated last year
lemay-ai / lazyTextPredict
View on GitHub
Text classification automl
☆21Jul 18, 2021Updated 4 years ago
Annmayn / html2excel
View on GitHub
Convert HTML tables to excel files
☆16Jul 3, 2021Updated 4 years ago
saaay71 / solr-vector-scoring
View on GitHub
Vector Plugin for Solr: calculate dot product / cosine similarity on documents
☆35Oct 27, 2020Updated 5 years ago
rinkstiekema / table-structures
View on GitHub
☆87Feb 12, 2020Updated 6 years ago
phamquiluan / PubLayNet
View on GitHub
ICDAR 2019: MaskRCNN on PubLayNet datasets. Paragraph detection, table detection, figure detection,...
☆183May 11, 2021Updated 5 years ago
blengerich / explainable-cnn
View on GitHub
Towards Visual Explanations for Convolutional Neural Networks via Input Resampling
☆13Aug 16, 2017Updated 8 years ago
poloclub / tsr-convstem
View on GitHub
High-Performance Transformers for Table Structure Recognition Need Early Convolutions
☆45Apr 21, 2026Updated 2 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
allenai / pawls
View on GitHub
Software that makes labeling PDFs easy.
☆431May 13, 2024Updated 2 years ago
AudiTranscribe / AudiTranscribe
View on GitHub
An open-source music transcription application.
☆13Sep 9, 2023Updated 2 years ago
Prakhar-97 / Table-detection-and-Document-layout-analysis
View on GitHub
☆10Jun 22, 2020Updated 6 years ago
BordiaS / layoutlm
View on GitHub
☆97Jul 13, 2020Updated 5 years ago
openzipkin / pyramid_zipkin-example
View on GitHub
See how much time python services spend on an http request
☆14Feb 26, 2019Updated 7 years ago
Living-with-machines / histLM
View on GitHub
Neural Language Models for Historical Research
☆29Oct 16, 2024Updated last year
isaacus-dev / mleb
View on GitHub
The code used to evaluate embedding models on the Massive Legal Embedding Benchmark (MLEB).
☆39Feb 24, 2026Updated 4 months ago