bjjanssen / bjjocrLinks
Script to automatically perform zonal OCR on a PDF and rename the PDF according to the results.
☆15Updated 11 years ago
Alternatives and similar repositories for bjjocr
Users that are interested in bjjocr are comparing it to the libraries listed below
Sorting:
- View browser history as a graph (Chrome extension)☆44Updated last year
- A dynamic media input form developed for oTranscribe☆18Updated 10 years ago
- Generate a list of your GitHub stars by topic - automatically!☆83Updated 2 years ago
- Data collection for Airbnb business☆13Updated 10 years ago
- A Python script that converts articles from an RSS feed into a single Ebook (ePub, PDF, docx - pandoc powered).☆22Updated 7 years ago
- Read files (pdf/png/jpg) with OCR and rename using AI.☆24Updated last year
- Clean a series of links, resolving redirects and finding Wayback results if page is gone. Originally written to aid with importing from A…☆18Updated 10 months ago
- Image to text recognition for ISBN numbers from books.☆14Updated 2 years ago
- TSCron ... a Google Form based Cron scheduler powered by Google Apps Script.☆22Updated 3 years ago
- Audio (and video) player for oTranscribe☆27Updated 9 years ago
- WIP tag-based file organizer & search☆39Updated last year
- Manage, generate convert chapters for podcasts and other media via cli and web☆36Updated 4 months ago
- In Jan 2021 I moved around 29k notes from Evernote to markdown. These are the scripts I used to clean-up, validate, maintain the markdown…☆36Updated 3 years ago
- LLM plugin for embeddings using sentence-transformers☆70Updated 4 months ago
- 💡✏️️ ⬇️️ JSON to Markdown converter - Generate Markdown from format independent JSON☆74Updated 6 years ago
- ChatGPT Conversations to Markdown is a Python script that converts your exported ChatGPT conversations into readable and well-formatted M…☆63Updated 10 months ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 11 months ago
- Python tool to monitor RSS feeds and download the linked content.☆15Updated 7 years ago
- Powerful command-line tool for slicing & dicing HTML☆37Updated 2 years ago
- Just Refs - extract just the references and related topics from any page on the English Wikipedia☆13Updated 5 years ago
- Command-line program for organizing and managing ebook collections. It is a Python port from the original shell scripts ebook-tools☆23Updated last year
- A dockerized, queued high fidelity web archiver based on Squidwarc☆61Updated last year
- Filter RSS Feed with GPT-4☆16Updated 2 years ago
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆48Updated this week
- Check out https://github.com/webrecorder/webrecorder for newer version matching https://webrecorder.io☆38Updated 9 years ago
- Tool to extracts the text from a web article urls and get frequency words, entities recognition, automatic summary and more☆20Updated 6 years ago
- HTML tables are underrated☆21Updated 3 weeks ago
- Writing argparse-based command line applications can become tedious, repetitive, and difficult to do right. Relax and let this library fr…☆11Updated 2 months ago
- A curated list of my GitHub stars!☆36Updated this week
- DigestBox takes any webpage URL (news article, video link, comment thread, etc.) and gives you just the raw content. It's powered by Arch…☆19Updated last year