maxcountryman / warc-parquetLinks
🗄️ A simple CLI for converting WARC to Parquet.
☆113Updated 10 months ago
Alternatives and similar repositories for warc-parquet
Users that are interested in warc-parquet are comparing it to the libraries listed below
Sorting:
- ☆107Updated 7 months ago
- Scale to zero Seafowl hosting with Cloud Run☆37Updated 2 years ago
- A safe, stateful rules language for event streams☆114Updated 2 years ago
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated 2 years ago
- Zig library for HyperLogLog estimation☆91Updated last year
- ZSV Utility for converting json to/from zip-separated-values☆56Updated last year
- A Go program to split large JSON files into many jsonl files☆61Updated 3 years ago
- SQLite3 extension for read-only HTTP(S) database access☆57Updated 2 years ago
- Create a SQLite database containing metadata from Google Drive☆163Updated 9 months ago
- ☆26Updated last year
- ☆165Updated last year
- ayb makes it easy to create databases, share them with collaborators, and query them from anywhere☆79Updated this week
- WarcDB: Web crawl data as SQLite databases.☆404Updated last year
- Static analysis and LSP for SQL in Rust☆88Updated last week
- Beating the `bisect` module's implementation using C-extensions.☆32Updated 2 years ago
- the fastest CSV SQLite extension, written in Rust☆140Updated 10 months ago
- ☆54Updated 7 months ago
- Code to accompany blog post https://reorchestrate.com/posts/sqlite-transactions☆65Updated last year
- SQL transformation tool for DuckDB written in Rust☆72Updated 9 months ago
- jq extension for SQLite.☆103Updated last year
- Tools for running OCR against files stored in S3☆120Updated 3 years ago
- Foundation DB Query Language☆147Updated last week
- Multi-model transactional embedded database☆68Updated last year
- A library for parsing and executing Excel-style formulas☆57Updated 2 years ago
- progscrape.com source☆96Updated 3 months ago
- The Directed Acyclic Graph Elevation Markup Language☆81Updated 8 months ago
- a file transfer service utilizing quic☆68Updated 11 months ago
- Ask questions, let GPT do the SQL.☆133Updated 2 years ago
- Fast similarity search using DuckDB☆142Updated last year
- Block Erasure Format - An extensible, fast, and usable file utility to encode and decode interleaved erasure coded streams of data.☆58Updated last year