maxcountryman / warc-parquetLinks
🗄️ A simple CLI for converting WARC to Parquet.
☆112Updated 7 months ago
Alternatives and similar repositories for warc-parquet
Users that are interested in warc-parquet are comparing it to the libraries listed below
Sorting:
- ☆108Updated 4 months ago
- Scale to zero Seafowl hosting with Cloud Run☆37Updated 2 years ago
- A safe, stateful rules language for event streams☆114Updated 2 years ago
- SQLite3 extension for read-only HTTP(S) database access☆55Updated last year
- ZSV Utility for converting json to/from zip-separated-values☆56Updated last year
- the fastest CSV SQLite extension, written in Rust☆138Updated 7 months ago
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated 2 years ago
- Zig library for HyperLogLog estimation☆89Updated last year
- ☆163Updated last year
- Gavin Mendel-Gleason's blog☆88Updated last year
- ayb makes it easy to create databases, share them with collaborators, and query them from a web application or the command line☆76Updated this week
- WarcDB: Web crawl data as SQLite databases.☆406Updated last year
- Fast similarity search using DuckDB☆140Updated 10 months ago
- A tool for creating a repository of transcribed videos☆53Updated last year
- A Go program to split large JSON files into many jsonl files☆61Updated 2 years ago
- What if an HNSW index was just a file, and you could serve it from a CDN, and search it directly in the browser?☆107Updated 5 months ago
- Beating the `bisect` module's implementation using C-extensions.☆30Updated 2 years ago
- SQL transformation tool for DuckDB written in Rust☆70Updated 6 months ago
- jq extension for SQLite.☆101Updated last year
- Code to accompany blog post https://reorchestrate.com/posts/sqlite-transactions☆65Updated last year
- The fastest 128-bit and 256-bit hash, passes all tests, and under 140 source lines of code. API library and CLI tool in C++ and NodeJS/Wa…☆126Updated 7 months ago
- ☆111Updated last year
- Reverse Geocode for OpenStreetmap☆129Updated last year
- off the charts color quantization 🎨☆152Updated 2 months ago
- A Unikernel running WebAssembly code☆50Updated 2 years ago
- Shell scripting for serverless☆140Updated 3 years ago
- Block Erasure Format - An extensible, fast, and usable file utility to encode and decode interleaved erasure coded streams of data.☆58Updated last year
- A library for parsing and executing Excel-style formulas☆58Updated 2 years ago
- Create a SQLite database containing metadata from Google Drive☆161Updated 6 months ago
- Foundation DB Query Language☆145Updated 2 weeks ago