Various Jupyter notebooks about Common Crawl data
☆64Nov 22, 2025Updated 4 months ago
Alternatives and similar repositories for cc-notebooks
Users that are interested in cc-notebooks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Index Common Crawl archives in tabular format☆126Updated this week
- Process Common Crawl data with Python and Spark☆453Jan 20, 2026Updated 2 months ago
- Tools to construct and process Common Crawl webgraphs☆105Feb 19, 2026Updated last month
- Scientific articles using or citing Common Crawl data☆28Updated this week
- Deployment of pywb as a CommonCrawl Index Server☆21Oct 6, 2017Updated 8 years ago
- LiT (Zero-Shot Transfer with Locked-image text Tuning) image and text encoder models, working in the browser☆11May 16, 2022Updated 3 years ago
- ☆10Apr 10, 2014Updated 11 years ago
- Diving into the data behind signs on Illinois highways that say "957 TRAFFIC DEATHS IN 2012." #peoplenotdata☆16Jul 8, 2021Updated 4 years ago
- Gathers urls from common crawl☆34Nov 9, 2019Updated 6 years ago
- ☆24Jan 27, 2026Updated last month
- ☆43Mar 10, 2023Updated 3 years ago
- Google Ticker Stock SVI (TS-SVI)☆17Dec 16, 2024Updated last year
- Functions for extracting commonly used linguistic features from text.☆12Nov 2, 2025Updated 4 months ago
- A simple, declarative workspace finder☆20Dec 23, 2024Updated last year
- A collection of models for TensorFlow Go☆12May 29, 2022Updated 3 years ago
- ☆17Jun 8, 2019Updated 6 years ago
- TypeScript library, MCP, and agent-friendly CLI for the BuiltWith API.☆19Updated this week
- Index URLs in Common Crawl☆197Sep 19, 2017Updated 8 years ago
- A whirlwind tour of Common Crawl's data using Python☆37Feb 17, 2026Updated last month
- Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)☆11Oct 25, 2021Updated 4 years ago
- research project while in University of Stuttgart☆11Mar 3, 2022Updated 4 years ago
- ☆29Jul 18, 2022Updated 3 years ago
- Streaming WARC/ARC library for fast web archive IO☆451Dec 10, 2024Updated last year
- A Python client for Calcbench's API.☆20Dec 10, 2024Updated last year
- Get a oneup on RSC by having everything laid out as simply as possible.☆26Mar 14, 2023Updated 3 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆47Dec 4, 2017Updated 8 years ago
- PathVis visualises traceroutes☆11Jan 25, 2024Updated 2 years ago
- Code for the paper on 247-CFE procurement☆10Dec 13, 2024Updated last year
- A collection and conversion of WARN notices from California☆12May 13, 2016Updated 9 years ago
- A vue rich editor component☆18Dec 11, 2022Updated 3 years ago
- ☆10Apr 6, 2023Updated 2 years ago
- Data Package of ratification status of the Paris Climate Agreement and the emissions shares used for entry into force☆14Feb 13, 2023Updated 3 years ago
- The code for Template-GPT-2 Generation Model for Logic2Text Dataset☆18Jun 1, 2020Updated 5 years ago
- A client library to retrieve data from misfit shine☆12Sep 4, 2015Updated 10 years ago
- Snowflake OAuth component for Streamlit☆14Feb 21, 2024Updated 2 years ago
- ☆12Sep 9, 2022Updated 3 years ago
- Curated list of awesome datasets for various table understanding tasks☆18Sep 5, 2025Updated 6 months ago
- A Qt5 app that plots timestamped MQTT data – status: unfinished alpha software.☆10May 7, 2022Updated 3 years ago
- Single list of government services, as a user would recognise them☆10Apr 8, 2017Updated 8 years ago