dbashford / textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,664Updated 2 years ago
Alternatives and similar repositories for textract:
Users that are interested in textract are comparing it to the libraries listed below
- Node module to allow for easy Excel file creation☆1,376Updated 2 years ago
- a javascript docx parser☆376Updated last month
- Standalone Office Open XML files (Microsoft Office 2007 and later) generator for Word (docx), PowerPoint (pptx) and Excell (xlsx) in java…☆2,681Updated 11 months ago
- Advanced html to text converter☆1,649Updated last year
- Full featured CSV parser with simple api and tested against large datasets.☆4,124Updated 4 months ago
- Download and extract files☆1,293Updated last year
- CSV parser and formatter for node☆1,704Updated this week
- A javascript library for defining recurring schedules and calculating future (or past) occurrences for them. Includes support for using …☆2,419Updated 7 years ago
- Agenda Dashboard☆799Updated 7 months ago
- rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)☆1,926Updated 6 years ago
- Generate hashes from javascript objects in node and the browser.☆1,437Updated 8 months ago
- PDF manipulation in Node.js! Split, join, crop, read, extract, boil, mash, stick them in a stew.☆286Updated last month
- A persistent, network resilient, full text search library for the browser and Node.js☆1,409Updated this week
- Javascript utility for calculating deep difference, capturing changes, and applying changes across objects; for nodejs and the browser.☆3,026Updated last year
- Node.js module for high performance creation, modification and parsing of PDF files and streams☆1,156Updated last month
- Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.☆2,526Updated last year
- a streaming interface for archive generation☆2,863Updated last month
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use.☆2,073Updated 2 months ago
- Node module that summarizes text using a naive summarization algorithm☆770Updated 5 months ago
- natural language processor powered by plugins part of the @unifiedjs collective☆2,395Updated 2 months ago
- Straightforward fuzzy matching, information retrieval and NLP building blocks for JavaScript.☆713Updated 9 months ago
- Access control lists for node applications☆2,626Updated last year
- Check if the internet connection is up☆1,253Updated 8 months ago
- A search server that can be installed with npm☆654Updated this week
- This repo isn't maintained anymore as phantomjs got dreprecated a long time ago. Please migrate to headless chrome/puppeteer.☆3,562Updated 10 months ago
- A module to create readable `"multipart/form-data"` streams. Can be used to submit forms and file uploads to other web applications.☆2,308Updated 2 weeks ago
- Flexible ascii progress bar for nodejs☆2,984Updated 2 years ago
- fs with incremental backoff on EMFILE☆1,284Updated 8 months ago
- Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.☆1,744Updated 3 years ago
- Node.js library for parsing crontab instructions☆1,361Updated this week