Various Jupyter notebooks about Common Crawl data
☆66Nov 22, 2025Updated 6 months ago
Alternatives and similar repositories for cc-notebooks
Users that are interested in cc-notebooks are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Index Common Crawl archives in tabular format☆128Jun 4, 2026Updated last week
- Process Common Crawl data with Python and Spark☆455Mar 26, 2026Updated 2 months ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆66Aug 5, 2016Updated 9 years ago
- Tools to construct and process Common Crawl webgraphs☆108May 25, 2026Updated 2 weeks ago
- Deployment of pywb as a CommonCrawl Index Server☆21Oct 6, 2017Updated 8 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- Gathers urls from common crawl☆35Nov 9, 2019Updated 6 years ago
- ☆27Apr 1, 2026Updated 2 months ago
- ☆14Mar 14, 2024Updated 2 years ago
- Graphs.jl-formatted graph files taken from the SNAP Datasets collection.☆17Mar 12, 2026Updated 3 months ago
- Is In-Context Learning Sufficient for Instruction Following in LLMs? [ICLR 2025]☆33Jan 23, 2025Updated last year
- web app console/dashboard/spreadsheet thingy for dat☆47Oct 1, 2014Updated 11 years ago
- Index URLs in Common Crawl☆197Sep 19, 2017Updated 8 years ago
- Code for the paper "Refining Language Model with Compositional Explanation" (NeurIPS 2021)☆11Oct 25, 2021Updated 4 years ago
- MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation☆18Sep 2, 2024Updated last year
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A whirlwind tour of Common Crawl's data using Python☆45Apr 13, 2026Updated 2 months ago
- CascadER: Cross-Modal Cascading for Knowledge Graph Link Prediction (arXiv 22)☆13Jun 17, 2022Updated 3 years ago
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆42Nov 19, 2024Updated last year
- Streaming WARC/ARC library for fast web archive IO☆458Updated this week
- 🐴🐘 Data on Members of the 116th U.S. Congress☆10Dec 11, 2019Updated 6 years ago
- ☆46Jan 26, 2020Updated 6 years ago
- Get a oneup on RSC by having everything laid out as simply as possible.☆26Mar 14, 2023Updated 3 years ago
- ☆16Apr 30, 2026Updated last month
- The code for Template-GPT-2 Generation Model for Logic2Text Dataset☆18Jun 1, 2020Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Curated list of awesome datasets for various table understanding tasks☆19Sep 5, 2025Updated 9 months ago
- An open-source platform to demonstrate the capabilities of a Granular Certificate registry that conforms to the EnergyTag Standards and A…☆13Mar 18, 2026Updated 2 months ago
- Code for the paper "Spatio-temporal load shifting for truly clean computing"☆15Feb 4, 2025Updated last year
- DuckDB Wasm Datasource Plugin☆15Feb 7, 2024Updated 2 years ago
- Python helpers for using AWS API Gateway / Lambda "serverless"☆13Jan 19, 2019Updated 7 years ago
- This api maked for "https://koinim.com/". This company haven't an api. But if you want use koinim api, use this script. It works splinter…☆10Nov 23, 2014Updated 11 years ago
- Python SDK Client for ZincSearch☆10Sep 21, 2022Updated 3 years ago
- An environment for benchmarking commonsense agents☆29Aug 19, 2020Updated 5 years ago
- ☆46Apr 13, 2022Updated 4 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- ☆13Sep 12, 2024Updated last year
- Add more items to Mozilla Add-on Manager context menu.☆13Mar 4, 2016Updated 10 years ago
- ☆17Apr 1, 2023Updated 3 years ago
- Atom package for text manipulation commands☆13Jan 19, 2016Updated 10 years ago
- ☆12Mar 5, 2021Updated 5 years ago
- Upload SQLite database files to Datasette☆14Nov 10, 2025Updated 7 months ago
- Nvidia GPU Fan Controller for linux☆15May 27, 2024Updated 2 years ago