maiaPhilippe / pdf-to-text
PDF OCR using Pure Javascript by tesseract.js api
☆20Updated 7 years ago
Related projects: ⓘ
- A React component for annotating PDF, powered by PDF.js and RecogitoJS☆45Updated 5 months ago
- Annotation layer for pdf.js☆260Updated last week
- Ergonomic line-by-line transcription of scanned text.☆47Updated 3 years ago
- Script that sets up and configures an entire CQPweb server installation☆11Updated 4 years ago
- guides and test data for OCR4all☆30Updated last year
- 🎞 transcribe > annotate > remix > publish video and audio content☆19Updated 3 months ago
- Miscellaneous data analysis tools and scripts for the EHRI project☆12Updated 7 months ago
- ☆31Updated last year
- Image annotation block for Airtable☆45Updated 3 years ago
- Provides OCR (Optical Character Recognition) services through web applications☆231Updated 7 months ago
- METS/ALTO OCR enhancing tool by the National Library of Luxembourg (BnL)☆52Updated last year
- gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.☆100Updated 3 years ago
- Validate and transform various OCR file formats (hOCR, ALTO, PAGE, FineReader)☆176Updated last month
- Open Video Annotation Project☆111Updated 7 years ago
- An editor for speech-to-text transcripts such as AWS Transcribe and Mozilla DeepSpeech☆85Updated last month
- Detect and align similar passages☆86Updated 2 weeks ago
- Conversions between various OCR formats☆71Updated last year
- Web based JavaScript GUI library for proofreading/editing hOCR☆89Updated 6 years ago
- Web interface for recognizing text, proofreading OCR, and creating fully-digitized documents.☆82Updated last week
- A JavaScript library for text annotation☆359Updated 5 months ago
- ALTO XML schema - latest and all former versions☆51Updated 2 months ago
- MathML Cloud API☆28Updated 3 years ago
- Ground Truth Resources for the HTR of patrimonial documents☆37Updated this week
- PAGE XML format collection for document image page content and more☆62Updated 3 years ago
- Images of example pages from Transkribus model training sets to make it easier to find a match.☆12Updated 2 years ago
- PDF parser and converter to HTML☆82Updated last year
- Tools for manipulating and evaluating the hOCR format for representing multi-lingual OCR results by embedding them into HTML.☆363Updated last month
- A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.☆179Updated last month
- ☆11Updated last year
- PDF to XML ALTO file converter☆209Updated this week