dbashford / textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,667Updated 2 years ago
Alternatives and similar repositories for textract:
Users that are interested in textract are comparing it to the libraries listed below
- A wrapper for the wkhtmltopdf HTML to PDF converter using WebKit☆610Updated 2 years ago
- Node.js module for high performance creation, modification and parsing of PDF files and streams☆1,158Updated 2 months ago
- Standalone Office Open XML files (Microsoft Office 2007 and later) generator for Word (docx), PowerPoint (pptx) and Excell (xlsx) in java…☆2,684Updated last year
- 🚜 Parse text and tables from PDF files.☆674Updated 3 months ago
- A NodeJS module to generate Excel files in .xlsx format from a template created with Excel itself☆413Updated 2 weeks ago
- Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.☆1,743Updated 3 years ago
- a streaming interface for archive generation☆2,870Updated last week
- Advanced html to text converter☆1,651Updated last year
- Node module that summarizes text using a naive summarization algorithm☆770Updated 6 months ago
- ImageMagick's Magick++ bindings for NodeJS☆629Updated 4 years ago
- Download and extract files☆1,294Updated last year
- A generic rate limiter for node.js. Useful for API clients, web crawling, or other tasks that need to be throttled☆1,536Updated this week
- rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)☆1,930Updated 6 years ago
- Node PDF Extract☆389Updated last year
- Node Application Metrics provides a foundational infrastructure for collecting resource and performance monitoring data for Node.js-based…☆980Updated 9 months ago
- Generate docx, pptx, and xlsx from templates (Word, Powerpoint and Excel documents), from Node.js or the browser. Demo: https://www.docxt…☆3,284Updated this week
- Flexible ascii progress bar for nodejs☆2,986Updated 2 years ago
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use.☆2,085Updated 3 months ago
- Converts HTML documents to DOCX in the browser☆1,095Updated 3 years ago
- NodeJS excel file parser & builder☆3,019Updated 10 months ago
- Node module to allow for easy Excel file creation☆1,376Updated 2 years ago
- PDF to HTML (pdf2htmlEX) shell wrapper pdftohtmljs☆145Updated 2 years ago
- HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.☆3,762Updated last week
- This repo isn't maintained anymore as phantomjs got dreprecated a long time ago. Please migrate to headless chrome/puppeteer.☆3,565Updated 11 months ago
- Promisify a callback-style function☆1,504Updated 2 years ago
- Extra JavaScript string methods.☆1,808Updated 3 years ago
- Native NodeJS implementation of MaxMind's GeoIP API -- works in node 0.6.3 and above, ask me about other versions☆2,366Updated last year
- CSV parser and formatter for node☆1,713Updated this week
- Agenda Dashboard☆801Updated 8 months ago
- Streaming csv parser inspired by binary-csv that aims to be faster than everyone else☆1,456Updated 3 months ago