kotartemiy / topic-labeled-news-dataset
100k+ topic labeled news articles published from thousands of news websites
☆18Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for topic-labeled-news-dataset
- A library to extract a publication date from a web page, along with a measure of the accuracy.☆42Updated 5 years ago
- Meta-repository for the open-source version of the SUMMA Platform☆16Updated 7 months ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Exploration and charting of world income distribution☆12Updated 5 years ago
- A set of tools to accelerate work in Jupyter notebooks.☆11Updated 4 years ago
- A curated list of awesome open source tools and commercial products to catalog, version, and manage data 🚀☆27Updated 2 years ago
- NSS Capstone project to use natural language modeling, classification, and information extraction to get the exact employee count values …☆15Updated 6 years ago
- A web application tagging and retrieval of arguments in text☆30Updated last year
- This repository auto-configures an Apache Pinot and Superset cluster for analyzing IRA tweets from FiveThirtyEight.☆11Updated 4 years ago
- Jupyter notebook + Code for scraping AngelList data and making an interactive chart of SFBA salaries/equity☆14Updated 8 years ago
- ☆12Updated 5 years ago
- The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques☆29Updated 4 years ago
- Matrix-based News Aggregation to Explore Media Bias☆20Updated 6 years ago
- A curated list of ML awesome frameworks & libraries for text data☆16Updated last year
- Code and visualizations for related/similar subreddits☆19Updated 8 years ago
- Deployment of pywb as a CommonCrawl Index Server☆21Updated 7 years ago
- Jupyter notebook + Code for reproducing Reddit Subreddit graphs☆16Updated 8 years ago
- Datasets for hackernews posts☆16Updated 2 years ago
- A markdown wiki and dashboarding system for Datasette☆21Updated 3 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Dump of generated texts from GPT-2 trained on /r/legaladvice subreddit titles☆23Updated 5 years ago
- Aviation grade news article metadata extraction☆36Updated last year
- Inspect a URL and estimate if it contains a news story☆39Updated this week
- Tokenization across languages. Useful as preprocessing for subword tokenization.☆22Updated last year
- It finds best synonyms from Google Books when you press a hotkey☆30Updated 9 years ago
- Integration between Reaction ECommerce and Accelerated Text to provide product descriptions for an e-shop.☆9Updated 3 years ago
- Documentation effort for the BookCorpus dataset☆33Updated 3 years ago
- Wikidata's QRank as a SQLite DB.☆29Updated 10 months ago
- Granular Viewer of Sentiments Between Entities in Massively Large Documents and Collections of Texts, powered by AREkit☆37Updated this week
- App store search example, using Jina as backend and Streamlit as frontend☆21Updated 2 years ago