LukasKriesch / CommonCrawlNewsDataSetLinks
This repository contains code to download, extract, filter and geocode news articles from the Common Crawl News Dataset
☆24Updated 8 months ago
Alternatives and similar repositories for CommonCrawlNewsDataSet
Users that are interested in CommonCrawlNewsDataSet are comparing it to the libraries listed below
Sorting:
- Samples of Entando applications☆12Updated 3 years ago
- Applicativo LMS integrato, sviluppato da ICCU, realizzato in architettura J2EE e che utilizza esclusivamente software free ed open-source☆12Updated last year
- A high-throughput ontology-based pipeline for data integration☆14Updated 2 years ago
- Archiving and transforming official Italian General Election text-only polls into machine readable data using Large Language Models☆16Updated this week
- The 2nd consultation on eForms, the update to the EU's procurement standard forms. Scroll down for more information.☆20Updated 6 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- ☆10Updated 9 years ago
- ☆40Updated 7 years ago
- Linked SDMX☆17Updated 11 years ago
- Language models are open knowledge graphs ( non official implementation )☆13Updated 5 years ago
- The Toxic Comment Classification project is an application that uses deep learning to identify toxic comments as toxic, severe toxic, obs…☆16Updated 2 years ago
- A step-by-step tutorial for publishing data and an ontology as Linked Data on your machine.☆14Updated 2 years ago
- RDF Community Discussions. Ask anything here!☆13Updated last year
- [0.9.9 Released] A high performance non-SPARQL based RDF data cube validator☆16Updated 9 years ago
- Convert Wikidata Items to vector embeddings☆33Updated 4 months ago
- Lehigh University Benchmark (LUBM).☆10Updated 5 years ago
- SPARQL-LD: A SPARQL Extension for Fetching and Querying Linked Data☆17Updated 2 years ago
- BEACON link dump format specification☆17Updated 8 years ago
- The Open Data Standards Directory is an iniative to provide an inventory of information regarding open data standards. This site is opera…☆28Updated 5 months ago
- A TypeScript library for building applications with RDF graph data.☆12Updated 2 months ago
- KnowledgeStore☆21Updated 8 years ago
- TellMeFirst is a tool for classifying and enriching textual documents via Linked Open Data.☆25Updated 3 years ago
- Repository for the Procedural Knowledge Ontology (PKO)☆27Updated last month
- Wikipedia Tools for Google Spreadsheets — Install:☆157Updated last year
- Linked Data to Natural Language☆11Updated 2 years ago
- DoCO, the Document Components Ontology, is an ontology for describing the component parts of a bibliographic document. It forms part of S…☆14Updated 6 years ago
- 📚 CORE ontology of ML-Schema and mapping to other machine learning vocabularies and ontologies (DMOP, Exposé, OntoDM, and MEX)☆29Updated 5 years ago
- The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.☆152Updated last month
- PAV - Provenance Authoring and Versioning ontology☆23Updated 4 months ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 8 years ago