node.js module for extracting text from html, pdf, doc, docx, xls, xlsx, csv, pptx, png, jpg, gif, rtf and more!
☆1,695Dec 15, 2025Updated 5 months ago
Alternatives and similar repositories for textract
Users that are interested in textract are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Standalone Office Open XML files (Microsoft Office 2007 and later) generator for Word (docx), PowerPoint (pptx) and Excell (xlsx) in java…☆2,711Apr 30, 2024Updated 2 years ago
- converts binary PDF to JSON and text, for server-side PDF processing and command-line use. Zero dependency.☆2,205Apr 16, 2026Updated last month
- Parse office documents (doc, docx, xls, etc..)☆183Apr 14, 2014Updated 12 years ago
- Automatically extract body content (and other cool stuff) from an html document☆2,162May 26, 2023Updated 3 years ago
- general natural language facilities for node☆10,878Feb 22, 2026Updated 3 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- The next web scraper. See through the <html> noise.☆5,906May 6, 2026Updated last month
- Convert Word documents (.docx files) to HTML☆6,225May 24, 2026Updated 2 weeks ago
- An image processing library written entirely in JavaScript for Node, with zero external or native dependencies.☆14,618Apr 7, 2026Updated 2 months ago
- Pure Javascript OCR for more than 100 Languages 📖🎉🖥