rgriffogoes / scraper-notebook
Jupyter Docker stack image with pre-installer scraper tools and libraries
☆25Updated 2 years ago
Related projects ⓘ
Alternatives and complementary repositories for scraper-notebook
- Daily TV News Summary using GPT☆21Updated 7 months ago
- ScrapingAnt API client for Python.☆35Updated 3 months ago
- Crawl a website to generate knowledge file for RAG☆17Updated 2 months ago
- Reads HTML files, converting tables into CSV files☆31Updated 4 years ago
- Marp Editor for @standardnotes. Create presentations with Marp and Marpit Markdown | https://marpeditor.com☆29Updated 3 years ago
- List of tools for dealing with the wonderful PDF format.☆46Updated 4 years ago
- Matomo plugin for Docusaurus v2/v3☆12Updated 11 months ago
- Jurisdiction ID and abbreviation data files for using with Jurism and other projects.☆33Updated last year
- Add browser pages to your local YACY index☆15Updated last year
- A Python3, async interface to the linkding REST API☆17Updated this week
- Clean, filter and sample URLs to optimize data collection – Python & command-line – Deduplication, spam, content and language filters☆125Updated last week
- Top 15K of GitHub's finest.☆52Updated this week
- Tools for interactive visual exploration of semantic embeddings.☆28Updated 2 months ago
- Telegram > OpenAI > Read Later [instapaper/pocket/omnivore]☆16Updated last year
- Python bindings for Upwork API (OAuth2)☆37Updated last year
- Automated behaviors that run in browser to interact with complex sites automatically. Used by ArchiveWeb.page and Browsertrix Crawler.☆33Updated last month
- 📑 Python Package to reconstruct the original continuous text from PDFs with language models☆33Updated last year
- Python based Wikidata framework for easy dataframe extraction☆39Updated 11 months ago
- Filter RSS Feed with GPT-4☆16Updated last year
- A News Article Collection Library☆22Updated last year
- Minimalist CI/CD for your Gitea☆31Updated 5 months ago
- Tailscale in Docker without elevated privileges☆53Updated 11 months ago
- Extract networks of entities from journalistic reporting☆47Updated last year
- Datasette pre-configured with useful plugins. Experimental alpha.☆27Updated 5 months ago
- A repository of custom widgets to embed in Grist documents☆56Updated this week
- ☆13Updated 8 months ago
- Sync worklogs between multiple time trackers, invoicing, and bookkeeping software.☆27Updated 4 months ago
- Little developer/power tools.☆13Updated 6 months ago
- Python Module to use the Readwise API☆16Updated this week
- Quick insights from Zoom meeting transcripts using Graph + NLP☆13Updated 2 years ago