maxcountryman / warc-parquet
🗄️ A simple CLI for converting WARC to Parquet.
☆109Updated 2 weeks ago
Alternatives and similar repositories for warc-parquet:
Users that are interested in warc-parquet are comparing it to the libraries listed below
- ☆108Updated 7 months ago
- Gavin Mendel-Gleason's blog☆89Updated last year
- Scale to zero Seafowl hosting with Cloud Run☆38Updated last year
- ZSV Utility for converting json to/from zip-separated-values☆58Updated 9 months ago
- Fast similarity search using DuckDB☆121Updated 4 months ago
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated last year
- the fastest CSV SQLite extension, written in Rust☆129Updated 3 weeks ago
- Code to accompany blog post https://reorchestrate.com/posts/sqlite-transactions☆66Updated 7 months ago
- What if an HNSW index was just a file, and you could serve it from a CDN, and search it directly in the browser?☆94Updated 9 months ago
- Multi-model transactional embedded database☆67Updated 2 months ago
- A library for parsing and executing Excel-style formulas☆58Updated last year
- jq extension for SQLite.☆93Updated 7 months ago
- ☆111Updated last year
- A simple performant GeoIP server written in Rust using MaxMind DBs with auto database update☆51Updated this week
- Non-presentation components of websheets☆40Updated last year
- SQL Language server and cli☆84Updated this week
- Command-line tool to remotely execute code in the cloud☆134Updated 2 years ago
- abuse ImageMagick (or GraphicsMagick) to create arbitrary files☆53Updated this week
- Shell scripting for serverless☆141Updated 2 years ago
- Beating the `bisect` module's implementation using C-extensions.☆30Updated last year
- Dumfederated gRPC social network implemented in Rust/Tonic/Diesel with a bundled React (web+native) frontend. 🐕💩EZ to deploy to your k8…☆65Updated last week
- Create a SQLite database containing metadata from Google Drive☆156Updated 2 years ago
- ☆163Updated 9 months ago
- webidx is a client-side search engine for static websites.☆58Updated last week
- WarcDB: Web crawl data as SQLite databases.☆398Updated 7 months ago
- A small language that compiles to WebAssembly Text format☆74Updated 10 months ago
- Module Oriented Large Archive Specialized Slow Exhaustive Searcher☆113Updated 9 years ago
- HNSW implementation in Rust. Reference: https://arxiv.org/ftp/arxiv/papers/1603/1603.09320.pdf☆224Updated 3 months ago
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆104Updated last year
- Block Erasure Format - An extensible, fast, and usable file utility to encode and decode interleaved erasure coded streams of data.☆58Updated 9 months ago