ljvmiranda921/prodigy-pdf-custom-recipe

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/ljvmiranda921/prodigy-pdf-custom-recipe)

ljvmiranda921 / prodigy-pdf-custom-recipe

Custom recipe and utilities for document processing

☆201

Alternatives and similar repositories for prodigy-pdf-custom-recipe

Users that are interested in prodigy-pdf-custom-recipe are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

axa-group / Parsr
View on GitHub
Transforms PDF, Documents and Images into Enriched Structured Data
☆6,177Mar 20, 2026Updated 4 months ago
koaning / bulk
View on GitHub
A Simple Bulk Labelling Tool
☆599Jul 29, 2025Updated 11 months ago
Lucaterre / spacyfishing
View on GitHub
A spaCy wrapper of Entity-Fishing (component) for named entity disambiguation and linking on Wikidata
☆173Nov 7, 2022Updated 3 years ago
graviraja / nlp-paper-summary
View on GitHub
☆16Oct 12, 2020Updated 5 years ago
NorskRegnesentral / skweak
View on GitHub
skweak: A software toolkit for weak supervision applied to NLP tasks
☆925Sep 2, 2024Updated last year
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
allenai / sequential_sentence_classification
View on GitHub
https://arxiv.org/pdf/1909.04054
☆77Nov 2, 2022Updated 3 years ago
fakabbir / OCR
View on GitHub
Probabilistic Key Value pair extraction using word weights from Invoices - Non Searchable PDF
☆17Jun 12, 2021Updated 5 years ago
SapienzaNLP / extend
View on GitHub
Entity Disambiguation as text extraction (ACL 2022)
☆182Apr 17, 2022Updated 4 years ago
cyclecycle / spacy-pattern-builder
View on GitHub
Reverse engineer patterns for use with SpaCy's DependencyMatcher
☆36Feb 8, 2020Updated 6 years ago
etalab-ia / mediatech
View on GitHub
Collection of public datasets from the French administration, vectorized and ready to use in AI projects.
☆17Jan 26, 2026Updated 5 months ago
deepset-ai / rasa-haystack
View on GitHub
☆49Mar 30, 2023Updated 3 years ago
AutoViML / lazytransform
View on GitHub
Automatically transform all categorical, date-time, NLP variables to numeric in a single line of code for any data set any size.
☆65Jan 29, 2025Updated last year
hmnth1 / table_ocr
View on GitHub
☆13Oct 1, 2020Updated 5 years ago
tejasvaidhyadev / NER_Lab_Protocols
View on GitHub
Domain-specific BERT representation for Named Entity Recognition of lab protocol
☆29Dec 25, 2020Updated 5 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
JournalismAI-2021-Quotes / quote-extraction
View on GitHub
Quote extraction for modular journalism (JournalismAI collab 2021)
☆230Feb 2, 2022Updated 4 years ago
khalidsaifullaah / BERTify
View on GitHub
An easy-to-use Python module that helps you to extract the BERT embeddings for a large text dataset (Bengali/English) efficiently.
☆36May 18, 2023Updated 3 years ago
vizzuhq / ipyvizzu
View on GitHub
Build animated charts in Jupyter Notebook and similar environments with a simple Python syntax.
☆971Feb 26, 2025Updated last year
DidierRLopes / openbb-slack-agent
View on GitHub
An OpenBB agent slack bot that is ready to answer any financial question
☆12Feb 24, 2024Updated 2 years ago
z3tt / fundamentals-ggplot2-pearson
View on GitHub
Material for the Pearson × O’Reilly Live Training Session "Hands-On Data Visualization with ggplot2: Concepts"
☆11Aug 29, 2023Updated 2 years ago
explosion / jupyterlab-prodigy
View on GitHub
🧬 A JupyterLab extension for annotating data with Prodigy
☆190May 10, 2023Updated 3 years ago
LeapBeyond / scrubadub_spacy
View on GitHub
Clean personally identifiable information from dirty dirty text using spaCy.
☆41Sep 1, 2023Updated 2 years ago
erre-quadro / spikex
View on GitHub
SpikeX - SpaCy Pipes for Knowledge Extraction
☆403Jul 30, 2021Updated 4 years ago
webis-de / small-text
View on GitHub
Active Learning for Text Classification in Python
☆646May 24, 2026Updated last month
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
wjbmattingly / spacy_tutorials_3x
View on GitHub
☆20May 23, 2021Updated 5 years ago
mindee / doctr
View on GitHub
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning. Ongo…
☆6,190Updated this week
explosion / spacy-experimental
View on GitHub
🧪 Cutting-edge experimental spaCy components and features
☆104Apr 23, 2024Updated 2 years ago
explosion / projects
View on GitHub
🪐 End-to-end NLP workflows from prototype to production
☆1,432Oct 15, 2024Updated last year
CLARIN-PL / LEPISZCZE
View on GitHub
This is the way: designing and compiling LEPISZCZE, a comprehensive NLP benchmark for Polish
☆15May 20, 2026Updated 2 months ago
vespa-engine / vespa-search
View on GitHub
☆13Updated this week
jboynyc / textnets
View on GitHub
Text analysis with networks.
☆294May 14, 2026Updated 2 months ago
doubleshow / superlinked
View on GitHub
A compute framework for building Search, RAG, Recommendations and Analytics over complex (structured+unstructured) data, with ultra-modal…
☆12Sep 16, 2024Updated last year
IBM / zshot
View on GitHub
Zero and Few shot named entity & relationships recognition
☆400Sep 17, 2025Updated 10 months ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
aFelipeSP / pdfme
View on GitHub
Make PDFs easily
☆324Mar 17, 2026Updated 4 months ago
soumik12345 / Adventures-with-GANS
View on GitHub
Showcasing various fun adventures with GANs
☆14Mar 24, 2023Updated 3 years ago
explosion / assets
View on GitHub
💥 Explosion Assets
☆45Dec 10, 2023Updated 2 years ago
MaartenGr / PolyFuzz
View on GitHub
Fuzzy string matching, grouping, and evaluation.
☆800Jul 10, 2025Updated last year
alexnowakvila / DiCoNet
View on GitHub
☆10Feb 22, 2018Updated 8 years ago
Layout-Parser / layout-parser
View on GitHub
A Unified Toolkit for Deep Learning Based Document Image Analysis
☆5,765Aug 15, 2024Updated last year
thiippal / MoodCat
View on GitHub
MoodCat😼 classifies the mood of English sentences.
☆14Jun 19, 2022Updated 4 years ago