🏭 PDF text extraction pipeline: self-hosted, local-first, Docker-based
☆332Oct 13, 2023Updated 2 years ago
Alternatives and similar repositories for pd3f
Users that are interested in pd3f are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- SLUB Document Classification and Similarity Analysis☆10Aug 31, 2023Updated 2 years ago
- List of people, organisations, groups, … doing datavis in Berlin☆11Mar 17, 2026Updated 3 weeks ago
- A Python library for defining rule-based overrides on messy data☆18Nov 24, 2025Updated 4 months ago
- Platform for journalists to search, analyse, categorise and share unstructured data☆60Updated this week
- Transforms PDF, Documents and Images into Enriched Structured Data☆6,173Mar 20, 2026Updated 2 weeks ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Provide partial dates and retain the date precision through processing☆14Aug 4, 2025Updated 8 months ago
- Maschinenlesbare Wahlprogramme der Europawahl 2019☆13May 14, 2019Updated 6 years ago
- An alpha project combining beneficial ownership and contracting data☆13Jun 9, 2021Updated 4 years ago
- Data cleaning and validation functions for names, languages, identifiers, etc.☆57Mar 30, 2026Updated last week
- The Toolkit API, app, and browser extension. Start preserving now.☆49Updated this week
- A collaborative collection of structured datasets and document collections that are common to use within "Follow the Money" investigation…☆15Apr 1, 2026Updated last week
- A Python scraping module, that extracts text from articles found in RSS feeds. Uses SQLite as database.☆20Jul 5, 2024Updated last year
- This is a prototype of a semi-automatic data anonymization app for German documents. ➡️ The project has moved to: https://gitlab.opencode…☆24Mar 20, 2026Updated 2 weeks ago
- etl pipeline, graphical explorer and general toolbox for investigations with follow the money data☆25Jul 15, 2025Updated 8 months ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A Python helper library to convert between ISO 639 two- and three-letter codes.☆11Nov 13, 2024Updated last year
- Collecting good beginner tasks and project ideas.☆16Apr 23, 2018Updated 7 years ago
- UPNP for node.js☆15Mar 21, 2019Updated 7 years ago
- OffeneRegister.de – Offene Daten für das Handelsregister☆35Feb 2, 2026Updated 2 months ago
- This is a german ELMo deep contextualized word representation. It is trained on a special German Wikipedia Text Corpus.☆28Dec 15, 2019Updated 6 years ago
- Reichsanzeiger-NLP: NER/NEL corpus for the German historical newspaper "Deutscher Reichsanzeiger und Preußischer Staatsanzeiger" (1819–19…☆16Oct 18, 2024Updated last year
- Extract networks of entities from journalistic reporting☆49Jul 17, 2023Updated 2 years ago
- Transparenzranking.de vergleicht alle Transparenzregelungen Deutschlands☆12Mar 26, 2026Updated 2 weeks ago
- A miniature version of the l4 language☆13Jun 29, 2025Updated 9 months ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Ask questions about government data.☆38Jan 17, 2019Updated 7 years ago
- ☆19Jan 16, 2024Updated 2 years ago
- GLiNER inference in JavaScript☆24Mar 2, 2025Updated last year
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆278Oct 9, 2022Updated 3 years ago
- Javascripts☆11Mar 1, 2026Updated last month
- A Repo For Document AI☆3,154Mar 31, 2026Updated last week
- API client for Aleph, supports bulk entity and document upload.☆29Mar 5, 2026Updated last month
- Analyse des Pegida facebook Korpus☆10Jan 31, 2015Updated 11 years ago
- Vite on Cloudflare Pages☆13Nov 1, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Python toolbox to load, parse and process Official Journals of the European Union (EU).☆22May 3, 2024Updated last year
- Character info under different encodings☆27Sep 12, 2025Updated 6 months ago
- A command line and Python client for Open-Spending☆10Nov 24, 2017Updated 8 years ago
- ☆14Aug 9, 2024Updated last year
- Train a SmolLM-style llm on fineweb-edu in JAX/Flax with an assortment of optimizers.☆19Jul 24, 2025Updated 8 months ago
- OCR-D compliant toolset for optical layout recognition on historical german-language documents published in Brazil☆11Sep 24, 2021Updated 4 years ago
- ☆12May 31, 2016Updated 9 years ago