Various Jupyter notebooks about Common Crawl data
☆64Nov 22, 2025Updated 3 months ago
Alternatives and similar repositories for cc-notebooks
Users that are interested in cc-notebooks are comparing it to the libraries listed below
Sorting:
- ☆10Apr 10, 2014Updated 11 years ago
- A collection of scripts and tools for analyzing SWE agents.☆15May 7, 2025Updated 9 months ago
- Scientific articles using or citing Common Crawl data☆28Jan 9, 2026Updated last month
- ☆12Nov 29, 2021Updated 4 years ago
- ☆23Jan 27, 2026Updated last month
- Scraper of ResetEra threads and posts to get them into a format suitable for feeding them into GPT-2.☆15Jun 20, 2019Updated 6 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Aug 5, 2016Updated 9 years ago
- A whirlwind tour of Common Crawl's data using Python☆35Feb 17, 2026Updated last week
- This repository contains expert evaluation interface and data evaluation script for the OpenScholar project.☆36Nov 19, 2024Updated last year
- A set of samples and notes for different approaches using the metering service with managed applications.☆29Mar 5, 2024Updated last year
- GPI-Space: Memory Driven Computing and Big Data☆10Jan 2, 2025Updated last year
- The DITAP Curriculum Update aims to modernize the DITAP training program by auditing and updating its content, resources, and assessment …☆12Oct 31, 2025Updated 4 months ago
- Experimenting with LLMs to Research, Reflect, and Plan (LLM assistants, retrieval, and Discord integration)☆33Jul 7, 2024Updated last year
- C++ code and MATLAB utilities for loading patterns onto TI DLP Digital Micromirror Device (DMD)☆14Dec 19, 2020Updated 5 years ago
- React-Autosuggest for Plotly Dash with Elasticsearch integration.☆12Dec 3, 2022Updated 3 years ago
- Self-evaluating RAG application on LangCheck docs☆11Sep 10, 2025Updated 5 months ago
- EOSIO-Taurus - The Most Powerful Infrastructure for Decentralized Applications☆13Mar 29, 2024Updated last year
- ☆12Oct 23, 2020Updated 5 years ago
- MATLAB/Octave generator of Hamming ECC coding. Output format is Verilog HDL.☆12Dec 27, 2022Updated 3 years ago
- LC6500DMD python control☆11Nov 15, 2016Updated 9 years ago
- Single list of government services, as a user would recognise them☆10Apr 8, 2017Updated 8 years ago
- A distilled DeepSeek-R1 variant built on Qwen2.5-32B, fine-tuned with curated data for enhanced performance and efficiency. <metadata> gp…☆16Mar 11, 2025Updated 11 months ago
- scrape web content into readable markdown for llms and human readers☆10Feb 19, 2024Updated 2 years ago
- A repository for the SRN documents database API☆14Feb 24, 2025Updated last year
- Data Package of ratification status of the Paris Climate Agreement and the emissions shares used for entry into force☆14Feb 13, 2023Updated 3 years ago
- An R Package for the Financial Modeling Prep Financial Data API☆13Aug 17, 2021Updated 4 years ago
- ☆10May 17, 2022Updated 3 years ago
- Curated list of awesome datasets for various table understanding tasks☆18Sep 5, 2025Updated 5 months ago
- Amplify your coding capabilities with AI - your smart co-pilot for an elevated coding experience.☆14Feb 18, 2026Updated last week
- Simple getting started procedure for SciCat☆11Updated this week
- IonQ iQuHACK 2024 Remote Challenge☆11Feb 3, 2024Updated 2 years ago
- ☆14Aug 19, 2025Updated 6 months ago
- ☆16Updated this week
- ☆21Updated this week
- Code for the paper "Learning noise-induced transitions by multi-scaling reservoir computing" (https://arxiv.org/abs/2309.05413).☆12Mar 20, 2025Updated 11 months ago
- Didactic Web crawler for Web Search Engines (CS 6913) course at NYU☆10Dec 8, 2022Updated 3 years ago
- Light Cube using PYNQ☆10Aug 4, 2018Updated 7 years ago
- 🐴🐘 Data on Members of the 116th U.S. Congress☆10Dec 11, 2019Updated 6 years ago
- A NOMAD plugin containing base sections for material processing.☆11Jan 20, 2026Updated last month