commoncrawl / whirlwind-python
A whirlwind tour of Common Crawl's data using Python
☆17Updated 2 months ago
Alternatives and similar repositories for whirlwind-python:
Users that are interested in whirlwind-python are comparing it to the libraries listed below
- A diagram of my personal infrastructure☆46Updated 4 years ago
- Quality News - Towards a fairer ranking formula for Hacker News☆81Updated 2 weeks ago
- Technical blogs around data collaboration, data management, and building collaborative applications.☆45Updated last year
- search interface for scholarly works☆84Updated 7 months ago
- 🗄️ A simple CLI for converting WARC to Parquet.☆109Updated 2 weeks ago
- A probabilistic approximate DNF counter☆36Updated 10 months ago
- ☆13Updated last year
- 🔎 A Prodigy plugin for evaluating spaCy pipelines☆13Updated 11 months ago
- A Collection of Awesome Personal Search Engines and Related Projects☆18Updated 2 years ago
- Tool for cleaning old and redundant backups☆13Updated last month
- A fluid medium for storing, relating, and surfacing thoughts.☆131Updated 2 years ago
- Web interface for Cayley☆25Updated last year
- Testing various image matching algorithms' performance on the Pinecone vector DB☆43Updated last year
- a graph definition and execution library for python☆16Updated last year
- A repository of mathematical knowledge written in the Mathlingua language.☆17Updated 3 months ago
- A [personal]<-[notebook]->[network]. Complete with custom numerics for constrained Gaussian gravitation physics.☆22Updated 2 years ago
- Open source scholarly literature search☆15Updated 5 months ago
- Datasette plugin for searching all searchable tables at once☆23Updated 5 months ago
- Tree Notation Python Library☆14Updated 2 years ago
- Questions from the Ham Radio General pool☆14Updated 9 months ago
- A description of the relationship between databases, collaboration and Kripke☆27Updated 3 years ago
- Handy decorator for elegant design-by-contract in 3.10+☆102Updated 2 years ago
- Gavin Mendel-Gleason's blog☆89Updated last year
- CLI tool for exploring arXiv (inspired by karpathy's brilliant ArXiv Sanity Preserver)☆39Updated 2 weeks ago
- Beating the `bisect` module's implementation using C-extensions.☆30Updated last year
- Tools for running OCR against files stored in S3☆119Updated 2 years ago
- ☆58Updated last year
- Backend, IA-specific tools for crawling and processing the scholarly web. Content ends up in https://fatcat.wiki☆25Updated 7 months ago
- Walk the AST for every callable in your code.☆18Updated 2 months ago
- Create matplotlib visualizations from the command-line☆50Updated 2 years ago