dbashford / textractLinks
node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,689Updated 2 weeks ago
Alternatives and similar repositories for textract
Users that are interested in textract are comparing it to the libraries listed below
Sorting:
- Node.js module for high performance creation, modification and parsing of PDF files and streams☆1,173Updated last month
- Advanced html to text converter☆1,682Updated 2 years ago
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use. Zero dependency.☆2,175Updated last week
- Node PDF Extract☆389Updated 2 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,161Updated 2 years ago
- Standalone Office Open XML files (Microsoft Office 2007 and later) generator for Word (docx), PowerPoint (pptx) and Excell (xlsx) in java…☆2,712Updated last year
- A persistent, network resilient, full text search library for the browser and Node.js☆1,423Updated 8 months ago
- A wrapper for the wkhtmltopdf HTML to PDF converter using WebKit☆615Updated 2 years ago
- Nimble, streamable HTTP client for Node.js. With proxy, iconv, cookie, deflate & multipart support.☆1,635Updated last month
- 🚜 Parse text and tables from PDF files.☆696Updated last month
- Easy website screenshots in Node.js☆2,119Updated 6 years ago
- CSV parser and formatter for node☆1,767Updated this week
- A simple wrapper for the Tesseract OCR package☆678Updated 5 years ago
- A search server that can be installed with npm☆658Updated 4 months ago
- Node module that summarizes text using a naive summarization algorithm☆770Updated last year
- Decode mime formatted e-mails☆1,652Updated 3 weeks ago
- A javascript library for defining recurring schedules and calculating future (or past) occurrences for them. Includes support for using …☆2,420Updated 7 years ago
- Machine-learning for Node.js☆1,053Updated last month
- Download and extract files☆1,301Updated 2 years ago
- rawStream.pipe(JSONStream.parse()).pipe(streamOfObjects)☆1,934Updated 7 years ago
- Date() for humans☆1,481Updated 3 years ago
- Flexible event driven crawler for node.☆2,134Updated 4 years ago
- Finds degree of similarity between two strings, based on Dice's Coefficient, which is mostly better than Levenshtein distance.☆2,535Updated 2 years ago
- Robust RSS, Atom, and RDF feed parsing in Node.js☆1,979Updated 2 years ago
- Unirest in Node.js: Simplified, lightweight HTTP client library.☆957Updated 8 months ago
- Scrape/Crawl article from any site automatically. Make any web page readable, no matter Chinese or English.☆346Updated 7 years ago
- natural language processor powered by plugins part of the @unifiedjs collective☆2,425Updated 10 months ago
- Full featured CSV parser with simple api and tested against large datasets.☆4,246Updated 2 months ago
- Distribute processing tasks to child processes with an über-simple API and baked-in durability & custom concurrency options.☆1,745Updated 3 years ago
- Index Mongoose models into elasticsearch automatically.☆1,072Updated 2 years ago