LukasKriesch / CommonCrawlNewsDataSetLinks
This repository contains code to download, extract, filter and geocode news articles from the Common Crawl News Dataset
☆22Updated 6 months ago
Alternatives and similar repositories for CommonCrawlNewsDataSet
Users that are interested in CommonCrawlNewsDataSet are comparing it to the libraries listed below
Sorting:
- Samples of Entando applications☆12Updated 3 years ago
- ☆10Updated 9 years ago
- Lehigh University Benchmark (LUBM).☆10Updated 5 years ago
- Convert Wikidata Items to vector embeddings☆30Updated 2 months ago
- TellMeFirst is a tool for classifying and enriching textual documents via Linked Open Data.☆25Updated 3 years ago
- A high-throughput ontology-based pipeline for data integration☆14Updated 2 years ago
- SPARQL-LD: A SPARQL Extension for Fetching and Querying Linked Data☆17Updated 2 years ago
- Generic platform for large scale collaborative planning☆17Updated 2 months ago
- The open-source adapter for working with RDF databases and SPARQL queries in Jupyter notebooks leveraging the yFiles Graphs for Jupyter p…☆21Updated 8 months ago
- RDF Community Discussions. Ask anything here!☆13Updated last year
- Homebase of the IPTC EXTRA project about rule-based text categorization☆13Updated 8 years ago
- Archiving and transforming official Italian General Election text-only polls into machine readable data using Large Language Models☆16Updated this week
- Java library for reading and writing WARC files with a typed API☆51Updated this week
- A Java-based SPARQL query generator☆12Updated last year
- Imports Wiktionary's grammatical data into Wikidata☆18Updated 5 years ago
- Code for my Wikimedia Labs Tools account☆95Updated 3 weeks ago
- DoCO, the Document Components Ontology, is an ontology for describing the component parts of a bibliographic document. It forms part of S…☆13Updated 6 years ago
- PAV - Provenance Authoring and Versioning ontology☆23Updated 2 months ago
- An HTTP proxy for Elasticsearch, Solr (etc.) to prevent a 100% full disk situation.☆11Updated 7 years ago
- Linked SDMX☆17Updated 11 years ago
- The code base of the front-end of nocodefunctions.com☆40Updated 2 months ago
- Wikipedia Tools for Google Spreadsheets — Install:☆155Updated last year
- Github mirror of "wikidata/query/rdf" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access…☆151Updated 2 months ago
- A step-by-step tutorial for publishing data and an ontology as Linked Data on your machine.☆14Updated 2 years ago
- Fixed and optimized OMI polygons from Agenzia Dell Entrate☆29Updated 5 months ago
- A web application that interfaces between openalex.org and Gephi☆11Updated 3 months ago
- Repo for the Wikimedia Listeria bot☆27Updated this week
- Language models are open knowledge graphs ( non official implementation )☆13Updated 4 years ago
- KnowledgeStore☆21Updated 7 years ago
- EduCOR: An Educational and Career-Oriented Recommendation Ontology☆12Updated 4 years ago