commoncrawl / cc-notebooksLinks
Various Jupyter notebooks about Common Crawl data
☆54Updated 3 months ago
Alternatives and similar repositories for cc-notebooks
Users that are interested in cc-notebooks are comparing it to the libraries listed below
Sorting:
- Tools to construct and process Common Crawl webgraphs☆92Updated last week
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆59Updated last month
- TextGraphs + LLMs + graph ML for entity extraction, linking, ranking, and constructing a lemma graph☆24Updated last year
- Fastlaw's purpose is to replace generic word embeddings for work on supervised machine learning NLP-tasks with legal texts.☆38Updated 6 years ago
- Scripts to load the GDELT data set into MongoDB☆12Updated 2 years ago
- A News Article Collection Library☆22Updated 2 years ago
- Aim-spaCy integration☆34Updated last year
- Scraping Assisted by Learning☆35Updated 2 months ago
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- Package to help with scientific literature research☆25Updated 2 years ago
- Search COVID-19 Open Research Dataset (CORD-19) using Vespa - the open source big data serving engine.☆38Updated last week
- XAI based human-in-the-loop framework for automatic rule-learning.☆49Updated last year
- Open Access PDF harvester☆40Updated last year
- Graph databases, Knowledge Graphs, SPARQ☆80Updated 3 years ago
- Pytorch implementation of a BiLSTM model for the Wikification project.☆19Updated 5 years ago
- GraphiPy: Universal Social Data Extractor☆84Updated 2 years ago
- Analyze trends in articles published on arXiv☆17Updated 2 years ago
- Python based Wikidata framework for easy dataframe extraction☆45Updated last year
- A personal knowledge base that I can dump information to and help me learn☆24Updated last month
- Modelling Big Five Personality Inventory using Machine Learning algorithms☆22Updated 8 months ago
- Common crawl extractor☆77Updated last year
- spaCy entry points for Curated Transformers☆31Updated last month
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Building a Job Dataset☆22Updated 3 years ago
- Create and customize your own Periodic Table. With the help of Streamlit and Bokeh.☆18Updated 2 years ago
- This Python package can be used to systematically extract multiple data elements (e.g., title, keywords, text) from news sources around t…☆33Updated 2 years ago
- Examples scripts that showcase how to use Private AI Text to de-identify, redact, hash, tokenize, mask and synthesize PII in text.☆82Updated 2 months ago
- Hosting examples of interactive datamapplot output☆25Updated 3 weeks ago
- [archived]☆18Updated 3 years ago
- GPT-3 attempts to predict & balance chemical reactions☆13Updated 4 years ago