rgriffogoes / scraper-notebook
Jupyter Docker stack image with pre-installer scraper tools and libraries
☆26Updated 2 years ago
Alternatives and similar repositories for scraper-notebook
Users that are interested in scraper-notebook are comparing it to the libraries listed below
Sorting:
- 📑 Scripts to repair, verify, OCR, compress, wrangle, crop (etc.) PDFs☆69Updated last year
- LLM plugin for embeddings using sentence-transformers☆60Updated 3 weeks ago
- Scrape HN to track links from specific domains☆59Updated last week
- Deduplicate and parse list of `dirty names'☆21Updated 4 years ago
- Hook toolkit for Paperless-ngx with a REST API client in written Go☆11Updated last week
- ☆19Updated 2 years ago
- Jurisdiction ID and abbreviation data files for using with Jurism and other projects.☆36Updated last year
- ☆25Updated 4 years ago
- Tools for interactive visual exploration of semantic embeddings.☆33Updated 8 months ago
- Dockerized workflow automation tool☆20Updated this week
- Data API and micro orm for DuckDB and MotherDuck☆8Updated 4 months ago
- SEMRush SERP Tutorial. Using advertools to Extract and Analyze Search Engine Results Pages Data☆14Updated 6 years ago
- A News Article Collection Library☆22Updated 2 years ago
- A dead simple REST API to use Playwright to scrape the text contents from any URL.☆27Updated last year
- A financial disclosure data extraction tool.☆16Updated last year
- ☆11Updated 5 months ago
- Human-in-the-loop document classification☆10Updated 3 years ago
- A simple machine learning package to cluster keywords in higher-level groups.☆17Updated 2 years ago
- Hey is a powerful chatbot for the command line CLI that uses ChatGPT to generate commands based on natural language input☆41Updated 2 years ago
- Daily TV News Summary using GPT☆24Updated 5 months ago
- A simple Python wrapper of Obsidian Local REST API: https://coddingtonbear.github.io/obsidian-local-rest-api/☆15Updated 2 years ago
- A list of awesome resources for users of Onyx Boox e-ink digital notebooks and ereaders (focused on e-ink devices like Amazon Kindle, Kob…☆25Updated 2 years ago
- Javascript/Node wrapper around Mozilla's Readability library so that ArchiveBox can call it as a oneshot CLI command to extract each page…☆40Updated 7 months ago
- advertools crawler UI☆28Updated 2 years ago
- A Firefox and Google Chrome extension to clip websites and download them into a readable markdown file.☆29Updated 6 years ago
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆32Updated last year
- mdsplit is a python command line tool to split Markdown files into chapters at a given heading level☆48Updated 6 months ago
- Telegram > OpenAI > Read Later [instapaper/pocket/omnivore]☆17Updated last year
- Filter RSS Feed with GPT-4☆16Updated last year
- Add browser pages to your local YACY index☆15Updated 2 years ago