dbashford / textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,664Updated 2 years ago
Alternatives and similar repositories for textract:
Users that are interested in textract are comparing it to the libraries listed below
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use.☆2,062Updated last month
- Advanced html to text converter☆1,642Updated last year
- Node PDF Extract☆388Updated last year
- Node.js module for high performance creation, modification and parsing of PDF files and streams☆1,153Updated 2 weeks ago
- Download and extract files☆1,293Updated last year
- This repo isn't maintained anymore as phantomjs got dreprecated a long time ago. Please migrate to headless chrome/puppeteer.☆3,561Updated 9 months ago
- Automatically extract body content (and other cool stuff) from an html document☆2,154Updated last year
- 🚜 Parse text and tables from PDF files.☆665Updated last month
- Standalone Office Open XML files (Microsoft Office 2007 and later) generator for Word (docx), PowerPoint (pptx) and Excell (xlsx) in java…☆2,670Updated 10 months ago
- Easy website screenshots in Node.js☆2,121Updated 5 years ago
- Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.☆1,744Updated 3 years ago
- CSV parser and formatter for node☆1,698Updated this week
- A simple wrapper for the Tesseract OCR package☆675Updated 4 years ago
- natural language processor powered by plugins part of the @unifiedjs collective☆2,382Updated last month
- Measure the difference between two strings with the fastest JS implementation of the Levenshtein distance algorithm☆721Updated 3 years ago
- a streaming interface for archive generation☆2,858Updated last week
- Full featured CSV parser with simple api and tested against large datasets.☆4,112Updated 3 months ago
- A module to create readable `"multipart/form-data"` streams. Can be used to submit forms and file uploads to other web applications.☆2,305Updated 3 weeks ago
- Streaming csv parser inspired by binary-csv that aims to be faster than everyone else☆1,445Updated last month
- NodeJS excel file parser & builder☆3,002Updated 8 months ago
- Access the system clipboard (copy/paste)☆1,818Updated last year
- Native node.js printer☆1,543Updated 2 years ago
- Nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support.☆1,639Updated last year
- Convert character encodings in pure javascript.☆3,109Updated last year
- Node module that summarizes text using a naive summarization algorithm☆771Updated 4 months ago
- A persistent, network resilient, full text search library for the browser and Node.js☆1,406Updated this week
- A Javascript implementation of zip for nodejs. Allows user to create or extract zip files both in memory or to/from disk☆2,086Updated 2 weeks ago
- ImageMagick's Magick++ bindings for NodeJS☆629Updated 4 years ago
- HTML parsing/serialization toolset for Node.js. WHATWG HTML Living Standard (aka HTML5)-compliant.☆3,742Updated this week
- Flexible ascii progress bar for nodejs☆2,980Updated 2 years ago