maxcountryman / warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
☆108Updated last week
Alternatives and similar repositories for warc-parquet:
Users that are interested in warc-parquet are comparing it to the libraries listed below
- Scale to zero Seafowl hosting with Cloud Run☆38Updated last year
- ☆109Updated 6 months ago
- ☆25Updated 11 months ago
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated last year
- ZSV Utility for converting json to/from zip-separated-values☆58Updated 7 months ago
- SQL Language server and cli☆86Updated this week
- A small language that compiles to WebAssembly Text format☆75Updated 9 months ago
- WarcDB: Web crawl data as SQLite databases.☆399Updated 6 months ago
- A safe, stateful rules language for event streams☆113Updated last year
- Multi-model transactional embedded database☆67Updated last month
- Zig library for HyperLogLog estimation☆89Updated 6 months ago
- abuse ImageMagick (or GraphicsMagick) to create arbitrary files☆53Updated 2 months ago
- Beating the `bisect` module's implementation using C-extensions.☆30Updated last year
- What if an HNSW index was just a file, and you could serve it from a CDN, and search it directly in the browser?☆89Updated 8 months ago
- Block Erasure Format - An extensible, fast, and usable file utility to encode and decode interleaved erasure coded streams of data.☆58Updated 8 months ago
- Fast similarity search using DuckDB☆116Updated 3 months ago
- webidx is a client-side search engine for static websites.☆58Updated last year
- Documentation and demonstration of how to build WASM versions of SQLite with extensions embedded☆28Updated 2 months ago
- Code to accompany blog post https://reorchestrate.com/posts/sqlite-transactions☆67Updated 6 months ago
- Create a SQLite database containing metadata from Google Drive☆154Updated 2 years ago
- the fastest CSV SQLite extension, written in Rust☆126Updated last year
- SQLite3 extension for read-only HTTP(S) database access☆53Updated last year
- A simple performant GeoIP server written in Rust using MaxMind DBs with auto database update☆49Updated last week
- Module Oriented Large Archive Specialized Slow Exhaustive Searcher☆113Updated 9 years ago
- Tools for running OCR against files stored in S3☆118Updated 2 years ago
- EBNF specification of the BBC's shipping forecast☆43Updated 2 years ago
- OnionShare; no Flask, just redbean.☆42Updated 2 years ago
- A js library to incorporate HN comments to any website☆31Updated 8 months ago
- Tiny toolset for compressing and hashing data really fast.☆29Updated last month
- Shell scripting for serverless☆141Updated 2 years ago