maxcountryman / warc-parquetLinks
🗄️ A simple CLI for converting WARC to Parquet.
☆113Updated 9 months ago
Alternatives and similar repositories for warc-parquet
Users that are interested in warc-parquet are comparing it to the libraries listed below
Sorting:
- ☆107Updated 6 months ago
- Scale to zero Seafowl hosting with Cloud Run☆37Updated 2 years ago
- A safe, stateful rules language for event streams☆114Updated 2 years ago
- Create a SQLite database containing metadata from Google Drive☆162Updated 8 months ago
- the fastest CSV SQLite extension, written in Rust☆139Updated 9 months ago
- ZSV Utility for converting json to/from zip-separated-values☆56Updated last year
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated 2 years ago
- SQLite3 extension for read-only HTTP(S) database access☆57Updated last year
- ☆26Updated last year
- SQL transformation tool for DuckDB written in Rust☆72Updated 8 months ago
- WarcDB: Web crawl data as SQLite databases.☆406Updated last year
- Zig library for HyperLogLog estimation☆91Updated last year
- Beating the `bisect` module's implementation using C-extensions.☆30Updated 2 years ago
- Shell scripting for serverless☆141Updated 3 years ago
- Gavin Mendel-Gleason's blog☆88Updated last year
- ☆163Updated last year
- EBNF specification of the BBC's shipping forecast☆43Updated 3 years ago
- ayb makes it easy to create databases, share them with collaborators, and query them from anywhere☆78Updated last week
- The Directed Acyclic Graph Elevation Markup Language☆81Updated 6 months ago
- Tools for running OCR against files stored in S3☆119Updated 3 years ago
- Multi-model transactional embedded database☆68Updated 11 months ago
- ☆112Updated last year
- Fast similarity search using DuckDB☆141Updated last year
- What if an HNSW index was just a file, and you could serve it from a CDN, and search it directly in the browser?☆107Updated 7 months ago
- Parallelism and preemptive concurrency for sporadic workloads☆46Updated 11 months ago
- A SQLite extension for extracting values from serialized Protobuf messages☆88Updated 4 months ago
- Foundation DB Query Language☆147Updated this week
- Template for sending a GitHub webhook over a zero trust, private network based on https://github.com/openziti/ziti☆59Updated 2 years ago
- A library for parsing and executing Excel-style formulas☆57Updated 2 years ago
- ☆47Updated 3 months ago