blueprintparadise / booknlp2
A successor to booknlp, aiming to fix bugs and improve model performance
☆10Updated 2 months ago
Related projects: ⓘ
- ☆28Updated 2 years ago
- Document processing using transformers☆19Updated last year
- ☆11Updated last year
- Recognition of handwritten text using CRAFT text detection and TrOCR☆24Updated last year
- Source code for the paper "Post-OCR Document Correction with Large Ensembles of Character Sequence-to-Sequence Models"☆34Updated 9 months ago
- Build Semantic Search with S-BERT and Fine-tune your model in unsupervised way☆57Updated 2 years ago
- ☆12Updated this week
- Code accompanying the submission "Structural Text Segmentation of Legal Documents" by Aumiller et al.☆96Updated last year
- This PyTorch implementation of LayoutLM paper by Microsoft demonstrate the SequenceClassfication task using HuggingFaceTransformers to cl…☆31Updated 2 years ago
- Layout Analysis Dataset with Segmonto (LADaS)☆17Updated 2 months ago
- A large-scale infographics dataset from Visual.ly with metadata and additional crowdsourced annotations☆12Updated 5 years ago
- Data and evaluation code for the paper WikiNEuRal: Combined Neural and Knowledge-based Silver Data Creation for Multilingual NER (EMNLP 2…☆65Updated last year
- Streamlit Named Entity Recognition (NER) annotation custom component☆38Updated last year
- Newspaper Segmentation into images and text☆12Updated 5 years ago
- Python 3 library for processing historical English☆64Updated last month
- https://kohinoor-soubam.medium.com/☆14Updated 3 years ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- ☆20Updated 5 years ago
- Master repository which includes most other OCR-D repositories as submodules☆71Updated last month
- Ground Truth Resources for the HTR of patrimonial documents☆37Updated this week
- A python library for extracting text from PDFs without losing the formatting of the PDF content.☆72Updated 2 years ago
- Segmenting text blocks and baselines from documents using deep learning techniques☆12Updated 3 years ago
- Abstractive and Extractive Text summarization using Transformers.☆83Updated last year
- Detecting company logos using deep learning☆17Updated 2 years ago
- A list of awesome AI in libraries, archives, and museum collections from around the world 🕶️☆82Updated 3 months ago
- ☆132Updated 6 months ago
- Sample implementation of OCR metrics (CER, WER) calculation with TesseractOCR and fastwer☆28Updated 3 years ago
- Natural language processing resources for multiple languages, with an eye towards use for digital humanities.☆124Updated 3 years ago
- This repository contains a notebook to demonstrate the power of Document Text Recognition (DocTR) library☆12Updated 3 years ago
- Latin BERT☆56Updated 2 months ago