dbashford / textract
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,643Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for textract
- Node PDF Extract☆384Updated last year
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use.☆2,017Updated 2 weeks ago
- Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.☆1,747Updated 2 years ago
- Node.js module for high performance creation, modification and parsing of PDF files and streams☆1,147Updated last month
- Full featured CSV parser with simple api and tested against large datasets.☆4,049Updated last week
- 🚜 Parse text and tables from PDF files.☆633Updated 2 weeks ago
- A wrapper for the wkhtmltopdf HTML to PDF converter using WebKit☆606Updated last year
- Download and extract files☆1,284Updated last year
- rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)☆1,917Updated 6 years ago
- Extra JavaScript string methods.☆1,809Updated 3 years ago
- A simple wrapper for the Tesseract OCR package☆675Updated 4 years ago
- A search server that can be installed with npm☆655Updated 2 months ago
- Flexible event driven crawler for node.☆2,141Updated 3 years ago
- BSON Parser for node and browser☆1,149Updated this week
- A javascript library for defining recurring schedules and calculating future (or past) occurrences for them. Includes support for using …☆2,418Updated 6 years ago
- Promisify a callback-style function☆1,506Updated 2 years ago
- fs with incremental backoff on EMFILE☆1,272Updated 3 months ago
- A persistent, network resilient, full text search library for the browser and Node.js☆1,390Updated last month
- CSV parsing implementing the Node.js `stream.Transform` API☆803Updated 3 years ago
- Check if the internet connection is up☆1,239Updated 3 months ago
- Standalone Office Open XML files (Microsoft Office 2007 and later) generator for Word (docx), PowerPoint (pptx) and Excell (xlsx) in java…☆2,655Updated 6 months ago
- Node module that summarizes text using a naive summarization algorithm☆769Updated last month
- An async libmagic binding for node.js for detecting content types by data inspection☆620Updated 5 months ago
- Measure the difference between two strings with the fastest JS implementation of the Levenshtein distance algorithm☆715Updated 3 years ago
- Flexible ascii progress bar for nodejs☆2,975Updated last year
- A node module for Google's Universal Analytics and Measurement Protocol☆960Updated last year
- Command Line UI toolkit for Node.js☆1,660Updated 4 years ago
- PDF manipulation in Node.js! Split, join, crop, read, extract, boil, mash, stick them in a stew.☆285Updated 5 months ago
- Lightweight Web Worker API implementation with native threads☆2,297Updated 3 years ago
- A streaming approach to JSON. Oboe.js speeds up web applications by providing parsed objects before the response completes.☆4,790Updated last month