CopenhagenCityArchives / CorrectOCRLinks

Machine Learning-assisted correction of OCR errors in historical corpora

☆10

Alternatives and similar repositories for CorrectOCR

Users that are interested in CorrectOCR are comparing it to the libraries listed below

Sorting:

koursaros-ai / microservices
Neural Elastic Inference and Search
☆19Updated 5 years ago
wbsg-uni-mannheim / productCategorization
This repository contains code and data download instructions for the workshop paper "Improving Hierarchical Product Classification using …
☆17Updated 4 years ago
Aayushpatel007 / topicrankpy
A Python package to get useful information from documents using TopicRank Algorithm.
☆16Updated 2 years ago
data-describe / awesome-data-science-models
A few end to end examples that use data-describe
☆16Updated 2 years ago
acatovic / textrank
Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…
☆11Updated 4 years ago
Hironsan / google-natural-language-sampler
Code examples for Google Natural Language API.
☆13Updated 5 years ago
PedroBarcha / context-spelling-correction
Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…
☆11Updated 6 years ago
AnushaMeka / NLP-Topic-Modeling-LDA-NMF
☆14Updated 6 years ago
johnbumgarner / newshound
This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…
☆33Updated 2 years ago
lemay-ai / lazyTextPredict
Text classification automl
☆21Updated 3 years ago
jhrcook / textrank-streamlit
A web app built with Streamlit that summarizes input text
☆13Updated 4 years ago
TyMick / loan-risk-neural-network
Loan Risk Prediction Neural Network and API
☆17Updated 4 years ago
kristopherkyle / TAALED
Tool for the Automatic Assessment of Lexical Diversity
☆12Updated 4 years ago
paperd / tensorflow
TensorFlow materials
☆13Updated 4 years ago
kbastani / russian-troll-analysis
This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.
☆11Updated 4 years ago
OCR-D / ocrd_tesserocr
Run tesseract with the tesserocr bindings with @OCR-D's interfaces
☆39Updated 2 months ago
david880110 / Media-Behavior-Trends-Analytics
☆13Updated 2 years ago
floriancochard / extract-data-from-paper
A tool designed to extract numerical data from scanned historical weather documents.
☆13Updated 7 months ago
text-machine-lab / TEA
Extracting narrative timelines (i.e. order and timing of events) from text
☆20Updated 6 years ago
robgon-art / BIG.art
Using Machine Learning to Create High-Res Fine Art
☆12Updated last year
facebookresearch / DataEfficientNLG
NLG Best Practices for Data-Efficient Modeling How to Train Production-Ready Models with Little Data
☆10Updated 3 years ago
deeppavlov / dp_tutorials
☆34Updated 5 years ago
jbesomi / fastlaw
Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.
☆38Updated 6 years ago
BramVanroy / spacy-extreme
An example of how to use spaCy for extremely large files without running into memory issues
☆36Updated 2 years ago
megagonlabs / teddy
Code and data for Teddy https://arxiv.org/abs/2001.05171.
☆15Updated 3 years ago
MBAigner / PDFSegmenter
This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…
☆23Updated 4 years ago
berkmancenter / corpusbuilder
Corpus Build OCR platform
☆8Updated 2 years ago
robgon-art / ai-memer
Using Machine Learning to Create Funny Memes
☆25Updated 2 years ago
YugantM / textcleaner
text-data pre-processing utility
☆13Updated 3 years ago
brucewlee / wiki-text-summarizer-keyword-extractor
Uses Beautiful Soup to read Wiki pages, Gensim to summarize, NLTK to process, and extracts keywords based on entropy: everything in one b…
☆9Updated 4 years ago