commoncrawl / cc-notebooksLinks
Various Jupyter notebooks about Common Crawl data
☆53Updated 2 months ago
Alternatives and similar repositories for cc-notebooks
Users that are interested in cc-notebooks are comparing it to the libraries listed below
Sorting:
- Tools to construct and process Common Crawl webgraphs☆91Updated last week
- Curated list of awesome software and resources for Senzing, The First Real-Time AI for Entity Resolution.☆58Updated last month
- arXiv plain text extraction☆41Updated 2 years ago
- ☆78Updated 2 years ago
- Pytorch implementation of a BiLSTM model for the Wikification project.☆19Updated 5 years ago
- Streamlit demo app to demonstrate the features of transformers interpret with multiple models.☆25Updated 3 years ago
- Factored Cognition Primer: How to write compositional language model programs☆49Updated 2 years ago
- Vespa application making an index of the CORD-19 dataset.☆39Updated 4 months ago
- A simple web application for searching Word2Vec embeddings derived from approximately 2,000 law reports published by the The Incorporated…☆26Updated 2 years ago
- 🌸 Train floret vectors☆18Updated 2 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- Graph databases, Knowledge Graphs, SPARQ☆81Updated 3 years ago
- From Dataset Labeling, Entity Extraction to production Knowledge Graph Deployment: The Power of NLP and LLMs Combined.☆12Updated last year
- XAI based human-in-the-loop framework for automatic rule-learning.☆49Updated 10 months ago
- Transform a corpus of text documents (any kind) into a map with different zoom levels and topics names to summarise sub corpus of similar…☆29Updated last year
- spaCy entry points for Curated Transformers☆31Updated last week
- DEPRECATED--all functionality moved to nbdev☆15Updated 2 years ago
- ☆30Updated 2 years ago
- Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine☆31Updated 3 years ago
- 🤗 Disaggregators: Curated data labelers for in-depth analysis.☆66Updated 2 years ago
- A simple converter from SpaCy Entities (Spans) to Huggingface BILOU formatted data (tokens and ner_tags)☆14Updated 8 months ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- Aim-spaCy integration☆34Updated last year
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆41Updated 5 years ago
- Summary Explorer is a tool to visually explore the state-of-the-art in text summarization.☆44Updated last year
- ☆70Updated 2 years ago
- Dataset and code for three Web crawling-related papers from SIGIR-2019, NeurIPS-2019. and ICML-2020.☆40Updated 4 months ago
- A collection of utilities for writing labeling functions, transformation functions, and slicing functions.☆21Updated 5 years ago
- Highly concurrent and fast content processing for Mighty Inference Server☆10Updated 2 years ago
- Our open source implementation of MiniLMv2 (https://aclanthology.org/2021.findings-acl.188)☆61Updated last year