ScientaNL / pdf-extractorLinks
Node.js module for rendering pdf pages to images, svgs, html files, text files and json metadata
☆100Updated 2 years ago
Alternatives and similar repositories for pdf-extractor
Users that are interested in pdf-extractor are comparing it to the libraries listed below
Sorting:
- nodejs lib for extracting data from PDF files☆232Updated last year
- Muhammara a node module with c/cpp bindings to modify PDF with js for node or electron (based/replacement on/of galkhana/hummusjs)☆264Updated 6 months ago
- A Node.js library to parse text out of any office file. Currently supports docx, pptx, xlsx and odt, odp, ods..☆208Updated 7 months ago
- Simple node package to convert a PDF into images.☆194Updated 8 months ago
- Microsoft Word doc/docx to PDF conversion, client-side in-browser, using docx-wasm☆55Updated 6 years ago
- Get text content from any file☆65Updated 10 months ago
- A high-performance in-memory convertor to convert svg to png/jpeg images for Node.☆166Updated last year
- A wrapper for PDF Toolkit with streams and promises.☆141Updated last year
- Annotation layer for pdf.js☆284Updated 8 months ago
- Read data from a Word document using node.js☆142Updated last year
- Parser to convert PPTX to JSON format☆89Updated 2 years ago
- Emscripten port of Tesseract C++ API☆174Updated 5 months ago
- 📰 Yet another Webassembly PDF renderer for node and the browser☆197Updated 11 months ago
- a javascript docx parser☆382Updated 4 months ago
- PDF.js-based PDF files viewer with annotation support☆95Updated 10 months ago
- ☆93Updated 4 months ago
- Interactive PPTX slide viewer☆38Updated 7 years ago
- Asynchronous Node.js wrapper for the Poppler PDF rendering library☆219Updated this week
- Pure Javascript reader/writer for PowerPoint☆144Updated 9 years ago
- A NPM Utility program to convert office documents (documents/excel/presentations) into PDF/HTML☆37Updated 4 years ago
- Windows MetaFile (wmf) processor☆18Updated 5 years ago
- pdf2html is a module which helps to convert PDF file to HTML pages using Apache Tika. This module also helps to generate thumbnail image …☆183Updated 2 weeks ago
- A Node.js wrapper for the Tesseract OCR API☆310Updated last year
- Javascript library for creating and manipulating Open XML Documents like docx, xlsx, etc. User can export grid data or images to open xml…☆31Updated 2 years ago
- RFC 822 EML file format parser and builder☆93Updated 2 years ago
- A tiny (< 100 LoC) library for trimming whitespace from a canvas element with no dependencies☆71Updated 5 years ago
- ☆189Updated 4 years ago
- Allows you to convert an HTML document into DOCX☆41Updated 2 years ago
- Extracts email address from an arbitrary text input.☆62Updated 4 months ago
- Building PDFium for Web Assembly☆75Updated 4 years ago