maxcountryman / warc-parquetLinks
🗄️ A simple CLI for converting WARC to Parquet.
☆110Updated 4 months ago
Alternatives and similar repositories for warc-parquet
Users that are interested in warc-parquet are comparing it to the libraries listed below
Sorting:
- Scale to zero Seafowl hosting with Cloud Run☆37Updated 2 years ago
- ☆109Updated last month
- ZSV Utility for converting json to/from zip-separated-values☆56Updated last year
- SQL transformation tool for DuckDB written in Rust☆52Updated 3 months ago
- Multi-model transactional embedded database☆68Updated 6 months ago
- Code to accompany blog post https://reorchestrate.com/posts/sqlite-transactions☆65Updated 11 months ago
- What if an HNSW index was just a file, and you could serve it from a CDN, and search it directly in the browser?☆106Updated 2 months ago
- Gavin Mendel-Gleason's blog☆89Updated last year
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated last year
- Reverse Geocode for OpenStreetmap☆129Updated 9 months ago
- Fast similarity search using DuckDB☆133Updated 7 months ago
- the fastest CSV SQLite extension, written in Rust☆134Updated 4 months ago
- ☆163Updated last year
- Command-line tool to remotely execute code in the cloud☆134Updated 3 years ago
- WarcDB: Web crawl data as SQLite databases.☆399Updated 11 months ago
- A Go program to split large JSON files into many jsonl files☆61Updated 2 years ago
- A Higher-Level, Composable SQL☆45Updated this week
- Interactive Python TUI for visualizing and analyzing files with multiple formats☆54Updated 2 months ago
- Create a SQLite database containing metadata from Google Drive☆161Updated 3 months ago
- Documentation and demonstration of how to build WASM versions of SQLite with extensions embedded☆26Updated 2 months ago
- Module Oriented Large Archive Specialized Slow Exhaustive Searcher☆113Updated 9 years ago
- The fastest 128-bit and 256-bit hash, passes all tests, and under 140 source lines of code. API library and CLI tool in C++ and NodeJS/Wa…☆126Updated 5 months ago
- webidx is a client-side search engine for static websites.☆60Updated 3 months ago
- SQL Language server and cli☆85Updated 4 months ago
- Zig library for HyperLogLog estimation☆89Updated 11 months ago
- A safe, stateful rules language for event streams☆114Updated last year
- ☆26Updated last year
- Dumfederated gRPC social network implemented in Rust/Tonic/Diesel with a bundled React (web+native) frontend. 🐕💩EZ to deploy to your k8…☆65Updated last week
- A copy of ONNX models, datasets, and code all in one GitHub repository. Follow the README to learn more.☆105Updated last year
- Uniform eXchange Format (uxf) is a plain text human readable optionally typed storage format that supports custom types. It may serve as …☆2Updated last year