mike0sv / Reuters-full-data-setLinks
Full dataset of Reuters composed of 8,551,441 news titles, links and timestamps (Jan 2007 - Aug 2016).
☆22Updated 8 years ago
Alternatives and similar repositories for Reuters-full-data-set
Users that are interested in Reuters-full-data-set are comparing it to the libraries listed below
Sorting:
- An end-to-end event extraction and summarization system.☆22Updated 4 years ago
- The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques☆29Updated 4 years ago
- Tool for sentiment analysis annotation☆12Updated 3 months ago
- 📄Neural Sentential Paraphrase Generation to Augment Chatbot Training Dataset☆21Updated 2 years ago
- Simple and clean Python implementation of TextRank as per seminal paper by Rada Mihalcea and Paul Tarau. This implementation performs bot…☆11Updated 4 years ago
- Extracting narrative timelines (i.e. order and timing of events) from text☆20Updated 6 years ago
- This repository for Web Crawling, Information Extraction, and Knowledge Graph build up.☆33Updated 7 years ago
- Analyze and extract Wikipedia article text and attributes and store them into an ElasticSearch index or to json files (multilingual suppo…☆47Updated last year
- Build intelligent data-driven applications with minimal effort. Sentence Clustering, Topics Extraction, Text Similarity, Opinion Summariz…☆40Updated 5 years ago
- Extraction of the five journalistic W-questions (5W) from news articles☆19Updated 7 years ago
- Similarity search on Wikipedia using gensim in Python.☆60Updated 6 years ago
- A set of tools for leveraging pre-trained embeddings, active learning and model explainability for effecient document classification☆29Updated 5 months ago
- Uses topic modeling to identify context between follower relationships of Twitter users☆62Updated last month
- OKR: A Consolidated Open Knowledge Representation for Multiple Texts☆41Updated 7 years ago
- SENTiVENT: Company-specific event detection in economic news☆24Updated 7 years ago
- WordNet Domains, WordNet Affect and SentiWords☆48Updated 9 years ago
- A Large Automatically-Constructed Resource of Predicate Paraphrases☆45Updated 5 years ago
- A Multilingual Latent Dirichlet Allocation (LDA) Pipeline with Stop Words Removal, n-gram features, and Inverse Stemming, in Python.☆85Updated last year
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Updated 12 years ago
- Interpretable feature construction from taxonomies for text classification☆18Updated 3 years ago
- Google News and Leo Tolstoy: Visualizing Word2Vec Word Embeddings using t-SNE.☆77Updated 6 years ago
- NLP: Relation extraction with position-aware self-attention transformer☆66Updated 2 years ago
- Repo for EMNLP 2020 paper, "Improving Neural Topic Models using Knowledge Distillation"☆31Updated 4 years ago
- Robust and Memory Efficient Event Detection and Tracking in Large News Feeds☆11Updated 3 years ago
- DeFactoNLP: An Automated Fact-checking System that uses Named Entity Recognition, TF-IDF vector comparison and Decomposable Attention mod…☆41Updated 5 years ago
- Sentence embeddings for unsupervised event detection in the Twitter stream: study on English and French corpora☆31Updated 4 months ago
- The News Landscape Toolkit (NELA)☆15Updated 4 years ago
- This repository contains the DFKI Product Corpus, a dataset of 174 documents annotated for product and company named entities, and the re…☆12Updated 10 months ago
- Fine tune GPT-2 with your favourite authors☆72Updated last year
- Simple script to query Google's Knowledge Graph API.☆41Updated 9 years ago