A whirlwind tour of Common Crawl's data using Python
☆45Apr 13, 2026Updated last month
Alternatives and similar repositories for whirlwind-python
Users that are interested in whirlwind-python are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A tool for detecting viruses and NSFW material in WARC files☆18Updated this week
- Selected code and data for The Online Books Page and related applications☆11Jun 1, 2026Updated last week
- How Media Cloud approaches extracting metadata from online news stories☆17Apr 15, 2026Updated last month
- Illuminating the scope and content of a digital text collections☆13Jul 28, 2015Updated 10 years ago
- Associated blog post - https://tristanrhodes.com/blog/Adventures-in-Algorithmic-Trading-on-the-Runescape-Grand-Exchange☆10Oct 14, 2024Updated last year
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Crowd-sourced lists of urls to help Common Crawl crawl under-resourced languages. See https://github.com/commoncrawl/web-languages-code/ …☆69Updated this week
- MCP Ethical Hacking Security sample for educational☆19Sep 16, 2025Updated 8 months ago
- A cli tool to clean up your development mess.☆12Jan 17, 2026Updated 4 months ago
- ☆11Aug 29, 2020Updated 5 years ago
- ☆26Sep 3, 2025Updated 9 months ago
- An Ethereum dApp for aggregating peer review.☆10Dec 22, 2022Updated 3 years ago
- Deprecated-- this code has been moved into a class of ao_core, which requires a private beta license. This repo is kept up for posterity …☆11Mar 5, 2025Updated last year
- keyboard-layout pools all the needed files to set up my custom XKB keyboard layout (takbl) on Linux Ubuntu.☆13Feb 5, 2024Updated 2 years ago
- Evernote MCP server - allows LLMs that support MCP (like Claude Desktop) to query your notes in Evernote☆53Mar 25, 2026Updated 2 months ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Vietnamese GPT-J API service deployed with Docker & Helm chart☆10Dec 11, 2022Updated 3 years ago
- Post a thread easily on Bluesky☆16Oct 28, 2024Updated last year
- An MCP server for octomind tools, resources and prompts☆22Feb 26, 2026Updated 3 months ago
- NetworkManager integration in Emacs☆17Aug 22, 2022Updated 3 years ago
- Chrome extension that uses Memento to indicate that a page a user is viewing on the live web has an archived copy and to give the user ac…☆58Aug 27, 2025Updated 9 months ago
- Datasette plugin adding a llm_embed(model_id, text) SQL function☆18Mar 17, 2024Updated 2 years ago
- Datasette plugin providing a UI for executing SQL writes against the database☆12Nov 11, 2025Updated 7 months ago
- Use MobileNet SSD and openCV to detect and count car on road☆11Jan 13, 2020Updated 6 years ago
- ☆23Dec 9, 2025Updated 6 months ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- emacs wrapper for pidgin instant messager☆18Mar 3, 2010Updated 16 years ago
- Software Engineering Back End Microservices Project☆15Nov 20, 2024Updated last year
- ☆14Jun 29, 2025Updated 11 months ago
- Datasette plugin for working with Apple's binary plist format☆14Feb 17, 2023Updated 3 years ago
- Redis backend for CherryPy sessions☆22Feb 21, 2023Updated 3 years ago
- Java library for reading and writing WARC files with a typed API☆59Apr 27, 2026Updated last month
- MATLAB/Octave generator of Hamming ECC coding. Output format is Verilog HDL.☆12Dec 27, 2022Updated 3 years ago
- Experimental repository for NER (Named-entity recognition) for sentences of Ukrainian language.☆13Aug 13, 2021Updated 4 years ago
- ☆16May 18, 2026Updated 3 weeks ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Detects air particulate matter (PM - pm1, pm2.5, pm10) concentrations and sends data to an MQTT server. An alternative firmware for ESP82…☆19Feb 19, 2020Updated 6 years ago
- ☆30Jun 2, 2026Updated last week
- Demo of using Airflow☆11Jun 24, 2022Updated 3 years ago
- A polite and user-friendly downloader for Common Crawl data☆82May 4, 2026Updated last month
- [ICLR26] AI-based scaling law discovery☆28Jan 30, 2026Updated 4 months ago
- Support for training SSD on TF2☆12Mar 29, 2023Updated 3 years ago
- WebRTC-HTTP Ingestion Protocol (WHIP) in Rust☆15Dec 17, 2025Updated 5 months ago