mawanda-jun / IntelligentOCR
An intelligent OCR to detect tables and pure text inside PDFs and obtaing a csv file and a txt from it
☆14Updated 6 years ago
Alternatives and similar repositories for IntelligentOCR:
Users that are interested in IntelligentOCR are comparing it to the libraries listed below
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- 版面分析+OCR☆11Updated 2 years ago
- Convert a corpus of PDF to clean text files on a distributed architecture☆38Updated last year
- ☆16Updated 3 years ago
- Given a text, wrap it into phrases and send them to Yandex's search engine. If it yields a "did you mean:", substitute the original phras…☆11Updated 6 years ago
- Document Image Classification☆11Updated 6 years ago
- An efficient data structure for fast string similarity searches☆22Updated 4 years ago
- Convert text from PDF to XML.☆45Updated 6 years ago
- Image Pre-processing to improve OCR accuracy.☆20Updated 8 years ago
- Machine Learning-assisted correction of OCR errors in historical corpora☆9Updated 4 months ago
- ☆18Updated 6 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated this week
- Analyze XML extracted from PDFs (e.g. from TET or PDFMiner)☆20Updated 7 years ago
- Text classification automl☆21Updated 3 years ago
- PDF Table Extractor - repository to hold revisable version of code from https://www.cvast.tuwien.ac.at/projects/pdf2table by Burcu Yildiz☆38Updated 11 months ago
- A tool designed to extract numerical data from scanned historical weather documents.☆13Updated 3 months ago
- Tools for extract figure, table, text, .. from a pdf document.☆32Updated 4 years ago
- YOLT (You Only Look Twice) - a tool that attempts to improve the accuracy of YOLOv4 in images☆21Updated 4 years ago
- Parser for KAF NAF files written in Python☆16Updated 3 years ago
- Ssebowa is free and open source library in Python that provides generative-ai models.☆14Updated last year
- Implementation of BertGrid : https://arxiv.org/abs/1909.04948☆30Updated 11 months ago
- A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extr…☆8Updated 3 years ago
- Document Layout Analysis Projects☆23Updated 5 years ago
- Python bindings for Neo4j☆26Updated 10 years ago
- A News Article Collection Library☆22Updated last year
- Simple Flask webservice to search through your PDF collection using Whoosh☆11Updated 10 years ago
- A DeepWalk implementation for ontologies using NetworkX and Gensim☆18Updated 7 years ago
- creating your own Awesome List by GitHub stars!☆12Updated 4 months ago
- Collaborative Discourse Manager☆11Updated 8 years ago