pd3f/pd3f-core

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/pd3f/pd3f-core)

pd3f / pd3f-core

📑 Python Package to reconstruct the original continuous text from PDFs with language models

☆33

Alternatives and similar repositories for pd3f-core

Users that are interested in pd3f-core are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

n-waves / ulmfit4de
View on GitHub
ULMFiT Method for German Language
☆15May 10, 2019Updated 7 years ago
neuralmind-ai / information-extraction-t5
View on GitHub
☆12Apr 29, 2022Updated 4 years ago
tonianelope / Multilingual-BERT
View on GitHub
Investigating multilingual language models (BERT) by using them for NER in German and English
☆14Apr 30, 2019Updated 7 years ago
ibm-aur-nlp / domain-specific-QA
View on GitHub
Extracting six domain-specific QA datasets from MS MARCO
☆17Dec 1, 2019Updated 6 years ago
pawel-bujnowski / smiler
View on GitHub
SMiLER - Samsung MultiLingual Entity and Relation Extraction dataset
☆18Feb 11, 2021Updated 5 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
talmago / spacy_crfsuite
View on GitHub
sequence tagging with spaCy and crfsuite
☆21Mar 18, 2023Updated 3 years ago
htrc / htrc-feature-reader
View on GitHub
Tools for working with HTRC Feature Extraction files
☆44Jul 8, 2025Updated last year
dan-zheng / swift
View on GitHub
The Swift Programming Language
☆13Aug 4, 2021Updated 4 years ago
openredact / expose-text
View on GitHub
This is a prototype of a Python module for simple modification of document files. ➡️ The project has moved to: https://gitlab.opencode.de…
☆19Mar 20, 2026Updated 4 months ago
dbmdz / historic-ner
View on GitHub
Repository for "Towards Robust Named Entity Recognition for Historic German"
☆18Dec 11, 2020Updated 5 years ago
thakur-nandan / income
View on GitHub
INCOME: An Easy Repository for Training and Evaluation of Index Compression Methods in Dense Retrieval. Includes BPR and JPQ.
☆24Sep 24, 2023Updated 2 years ago
alephdata / alephclient
View on GitHub
API client for Aleph, supports bulk entity and document upload.
☆30Mar 5, 2026Updated 4 months ago
tomh5905 / LOREM
View on GitHub
A Language-consistent Open Relation Extraction Model.
☆16Mar 24, 2023Updated 3 years ago
FORMAS / awesome_openie
View on GitHub
☆24Oct 3, 2023Updated 2 years ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
openredact / openredact-app
View on GitHub
This is a prototype of a semi-automatic data anonymization app for German documents. ➡️ The project has moved to: https://gitlab.opencode…
☆24Mar 20, 2026Updated 4 months ago
sdockray / dat-syllabus
View on GitHub
Peer-to-peer markdown syllabus platform for Beaker Browser.
☆14Dec 11, 2017Updated 8 years ago
TabbyML / quick-question
View on GitHub
☆13Apr 8, 2023Updated 3 years ago
ChristophAlt / tuna
View on GitHub
Hyperparameter search for AllenNLP - powered by Ray TUNE
☆28Mar 6, 2025Updated last year
kermitt2 / biblio_glutton_harvester
View on GitHub
Open Access PDF harvester
☆42May 3, 2024Updated 2 years ago
janmbuys / DeepDeepParser
View on GitHub
Neural Semantic Graph Parser
☆29Mar 14, 2018Updated 8 years ago
insin / hta-localstorage
View on GitHub
Basic localStorage implementation for Internet Explorer HTML Applications (HTA)
☆13Nov 2, 2014Updated 11 years ago
jes5199 / chief-wiggum
View on GitHub
it's just Ralph with a hat on
☆34Jan 5, 2026Updated 6 months ago
antlibs / ant-contrib
View on GitHub
A fork of Ant-Contrib tasks project at SourceForge
☆13Aug 27, 2023Updated 2 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
maxdotio / node-mighty-qdrant-starter
View on GitHub
Node starter kit for semantic-search. Uses Mighty Inference Server with Qdrant vector search.
☆15May 15, 2023Updated 3 years ago
sdockray / syllabus-hub
View on GitHub
A GitHub for syllabi
☆14Dec 11, 2017Updated 8 years ago
naver / domainshift-prediction
View on GitHub
☆11May 26, 2020Updated 6 years ago
Phyks / BMC
View on GitHub
BMC (BiblioManagementClient) is a simple script to download and store your articles.
☆16Mar 30, 2016Updated 10 years ago
gooofy / transformer-lm
View on GitHub
Transformer language model (GPT-2) with sentencepiece tokenizer
☆10Oct 15, 2019Updated 6 years ago
wolfgangmm / tei-simple-pm
View on GitHub
An implementation of the TEI Simple ODD extensions for processing models in XQuery.
☆22Jul 24, 2019Updated 6 years ago
maxdotio / mighty-batch
View on GitHub
Highly concurrent and fast content processing for Mighty Inference Server
☆10Feb 6, 2023Updated 3 years ago
Syntea / xdef
View on GitHub
X-definition 4.2 (Open Source Software)
☆17Updated this week
openredact / anonymizer
View on GitHub
A Python module that provides multiple anonymization techniques for text (This is only a prototype) ➡️ The project has moved to: https://…
☆26Mar 20, 2026Updated 4 months ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
lukasgarbas / can-we-tune-together
View on GitHub
Combining encoder-based language models
☆11Nov 11, 2021Updated 4 years ago
patverga / plant_jones
View on GitHub
☆11Nov 10, 2015Updated 10 years ago
kermitt2 / entity-fishing
View on GitHub
A machine learning tool for fishing entities
☆268Feb 27, 2026Updated 4 months ago
aarroyoc / node-xulrunner
View on GitHub
Like NW.js and node-webkit but with Gecko using XUL Runner
☆12May 12, 2017Updated 9 years ago
nstawfik / MedSentEval
View on GitHub
☆11Nov 19, 2020Updated 5 years ago
DARIAH-DE / DARIAH-DKPro-Wrapper
View on GitHub
Wrapper for DKPro Core to extract lingustic information from books.
☆16Feb 26, 2022Updated 4 years ago
o3o / dformlib
View on GitHub
Yet another fork of DFL
☆11Jul 19, 2016Updated 10 years ago