Florents-Tselai / WarcDBLinks
WarcDB: Web crawl data as SQLite databases.
☆404Updated last year
Alternatives and similar repositories for WarcDB
Users that are interested in WarcDB are comparing it to the libraries listed below
Sorting:
- A SQLite extension which loads a Google Sheet as a virtual table.☆519Updated 2 years ago
- A SQLite extension for reading large files line-by-line (NDJSON, logs, txt, etc.)☆403Updated last year
- A SQLite extension for querying, manipulating, and creating HTML elements.☆387Updated 2 years ago
- Python module to parse ingredient names. Splitting them into the ingredient, unit and quantity. It is trained on a publicly available dat…☆153Updated last year
- Create a SQLite database containing metadata from Google Drive☆161Updated 5 months ago
- 🗄️ A simple CLI for converting WARC to Parquet.☆112Updated 6 months ago
- A self hosted recommendation feed generated from your browsing habits☆314Updated 2 years ago
- A self-hosted live video streaming platform with Discord authentication, auto-recording and more!☆349Updated last year
- Repository for Pipes☆273Updated 2 weeks ago
- a simple website for sharing table data - with an API☆395Updated 2 months ago
- Shell scripting for serverless☆140Updated 3 years ago
- Query SQLite files in S3 using s3fs☆509Updated 2 years ago
- Functional UUIDs for Python.☆148Updated 4 years ago
- Command-line tool to remotely execute code in the cloud☆134Updated 3 years ago
- Dirty Little SQL Notebook☆114Updated 2 years ago
- Template repository for setting up shot-scraper☆255Updated 5 months ago
- DataHen Till is a companion tool to your existing web scraper that instantly makes it scalable, maintainable, and more unblockable, with …☆818Updated 3 years ago
- Geocode rows in a SQLite database table☆238Updated 2 years ago
- Scrapy rotation proxy package with advanced functions☆95Updated 3 years ago
- ☆114Updated 4 years ago
- Replace Splunk in your small company with this one weird trick!☆411Updated 5 months ago
- 🕟 date and time processing language☆300Updated 2 years ago
- Rsync-based time machine for Linux, written in Python, for local and remote backups.☆170Updated 2 years ago
- Module Oriented Large Archive Specialized Slow Exhaustive Searcher☆113Updated 10 years ago
- Simple Python Calculation Engine☆134Updated 3 years ago
- Minimalist log collector☆114Updated 7 months ago
- A generator for OpenAPI 3☆97Updated 4 years ago
- α-Indirect Control in Onion-like Networks☆148Updated last year
- Reverse Geocode for OpenStreetmap☆129Updated 11 months ago
- Web clipper browser extension for saving highlights, screenshots, and automatically extracting content from web pages.☆374Updated 3 years ago