noelmartinon / mboxzillaLinks
Export / upload emails from Thunderbird mbox files to single eml files
☆23Updated 2 years ago
Alternatives and similar repositories for mboxzilla
Users that are interested in mboxzilla are comparing it to the libraries listed below
Sorting:
- Simple tools for summarizing .mbox email archives.☆11Updated 5 years ago
- A wrapper for tesseract / abbyyOCR11 ocr4linux finereader cli that can perform batch operations or monitor a directory and launch an OCR …☆65Updated last year
- Automatic de-keystoning for single camera DIY book scanners.☆49Updated 4 years ago
- ScanTailor Universal - a fork based on Enhanced+Featured+Master versions of ST☆215Updated 3 months ago
- ☆11Updated 6 years ago
- Recover lost websites from the Web Infrastructure☆89Updated 4 years ago
- Near-duplicate detection tool☆24Updated 8 years ago
- Trough: Big data, small databases.☆42Updated 11 months ago
- The Bibliotheca Anonoma's own Bing Cache and Google Cache scraper scripts. Unlike most of the other ones you've seen, these actually work…☆28Updated 7 years ago
- Converts a Yahoo group archive created by yahoo-group-archiver into standalone email, mbox folders, and PDF files☆22Updated 3 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆42Updated last week
- Self-contained JBIG2 compressor for PDF files☆13Updated 7 years ago
- Serving content from a WARC☆61Updated 12 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Unicode Character Finder☆30Updated 8 months ago
- Documentation and scripts for book scanning using free software tools☆20Updated 9 years ago
- A Memento Aggregator CLI and Server in Go☆65Updated 3 months ago
- An HTML to Asciidoc converter written in JavaScript☆23Updated 10 years ago
- The Markdown Guide for the Perplexed☆24Updated 8 years ago
- Simplified version of a common crawl fetcher☆15Updated this week
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆18Updated last year
- User contributed (non Google) OCR models for Tesseract☆26Updated 2 months ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆46Updated 7 years ago
- Comparing warc files☆17Updated 6 years ago
- Dump the content of .enex files, preserving attachements, some metadata and optionally converting notes to Markdown.☆74Updated 5 years ago
- Parsing and extracting information from (possibly malformed) HTML/XML documents☆10Updated last year
- Tools for parsing hungarian legal documents☆16Updated 2 years ago
- A collection of tools for archiving and analysing the internet.☆77Updated 2 years ago
- Reads HTML files, converting tables into CSV files☆31Updated 5 years ago
- simple script to convert web resources to a single warc file☆21Updated 2 years ago