dbashford / textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,657Updated 2 years ago
Alternatives and similar repositories for textract:
Users that are interested in textract are comparing it to the libraries listed below
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use.☆2,037Updated 2 weeks ago
- a streaming interface for archive generation☆2,843Updated 2 months ago
- Node.js module for high performance creation, modification and parsing of PDF files and streams☆1,149Updated 3 months ago
- 🚜 Parse text and tables from PDF files.☆650Updated last month
- Advanced html to text converter☆1,623Updated last year
- Check if the internet connection is up☆1,245Updated 5 months ago
- Download and extract files☆1,289Updated last year
- Node module to allow for easy Excel file creation☆1,376Updated 2 years ago
- A Javascript implementation of zip for nodejs. Allows user to create or extract zip files both in memory or to/from disk☆2,075Updated 3 months ago
- Easy website screenshots in Node.js☆2,121Updated 5 years ago
- CSV parser and formatter for node☆1,680Updated this week
- A wrapper for the wkhtmltopdf HTML to PDF converter using WebKit☆609Updated last year
- HTML to PDF or image (jpeg, png, webp) converter via Chrome/Chromium☆778Updated last week
- Flatten/unflatten nested Javascript objects☆1,791Updated 4 months ago
- Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.☆2,526Updated last year
- A javascript library for defining recurring schedules and calculating future (or past) occurrences for them. Includes support for using …☆2,418Updated 6 years ago
- This repo isn't maintained anymore as phantomjs got dreprecated a long time ago. Please migrate to headless chrome/puppeteer.☆3,559Updated 8 months ago
- A persistent, network resilient, full text search library for the browser and Node.js☆1,395Updated 2 months ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆343Updated 6 years ago
- PhantomJS integration module for NodeJS☆3,534Updated 5 years ago
- Full featured CSV parser with simple api and tested against large datasets.☆4,090Updated last month
- high level amazon s3 client for node.js☆1,002Updated 4 years ago
- Promisify a callback-style function☆1,505Updated 2 years ago
- Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.☆1,743Updated 3 years ago
- Superseded by abstract-level. A wrapper for abstract-leveldown compliant stores, for Node.js and browsers.☆4,085Updated last month
- A module to create readable `"multipart/form-data"` streams. Can be used to submit forms and file uploads to other web applications.☆2,295Updated 3 months ago
- Imagemagick module for NodeJS — NEW MAINTAINER: @yourdeveloper☆1,816Updated 4 years ago
- A Portable Document Format (PDF) generation library targeting both the server- and client-side.☆784Updated last year
- An XML builder for node.js☆918Updated 5 months ago
- PDF manipulation in Node.js! Split, join, crop, read, extract, boil, mash, stick them in a stew.☆285Updated 6 months ago