ocropus / ocropus4-eval
Tools for evaluating OCR performance relative to ground truth.
☆10Updated 10 months ago
Related projects ⓘ
Alternatives and complementary repositories for ocropus4-eval
- DFKI Layout Detection for OCR-D☆47Updated 2 weeks ago
- ☆20Updated last year
- Nougat is a Meta AI's revolutionary OCR model designed to transcribe scientific PDFs into an easy-to-use Markdown format.☆21Updated last year
- Integrate AI-powered Document Analysis Pipelines☆62Updated this week
- >>PhysWikiQuiz<< - a Physics Question Generation and Interrogation System☆11Updated last year
- Logical structure analysis for visually structured documents☆84Updated 2 years ago
- Fast whitespace correction with Transformers☆14Updated 6 months ago
- Small python package to measure OCR quality and other related metrics.☆21Updated 9 months ago
- Run tesseract with the tesserocr bindings with @OCR-D's interfaces☆39Updated 3 months ago
- A Python pipeline tool and plugin ecosystem for processing technical documents. Process papers from arXiv, SemanticScholar, PDF, with GRO…☆44Updated 3 months ago
- This library builds a graph-representation of the content of PDFs. The graph is then clustered, resulting page segments are classified an…☆22Updated 4 years ago
- A Benchmark of PDF Information Extraction Tools using a Multi-Task and Multi-Domain Evaluation Framework for Academic Documents☆19Updated last year
- OCR-D post-correction module based on weighted finite-state transducers☆11Updated 10 months ago
- Seed Machine Translation Data☆30Updated last week
- Corpus Build OCR platform☆8Updated last year
- Conversions between various OCR formats☆71Updated last year
- Run OCR, extract information from documents and classify them. In addition, annotate documents and build custom NLP and computer vision m…☆62Updated this week
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- Keyword spaCy is a spaCy pipeline component for extracting keywords from text using cosine similarity.☆9Updated 11 months ago
- Repository for deepdoctection tutorial notebooks☆39Updated 4 months ago
- Open Access PDF harvester☆35Updated 6 months ago
- PAGE XML format collection for document image page content and more☆66Updated 3 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 2 years ago
- TeX compilation service that makes use of arXiv.org's AutoTeX library.☆27Updated 5 months ago
- ☆15Updated 3 years ago
- ☆21Updated 8 months ago
- ☆68Updated 8 months ago
- 💥 Use Hugging Face text and token classification pipelines directly in spaCy☆62Updated 8 months ago
- Document Image Binarization☆73Updated last month
- Scrollership through 20m pubmed abstracts.☆25Updated last year