Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
☆57Jan 28, 2024Updated 2 years ago
Alternatives and similar repositories for KeywordAnalysis
Users that are interested in KeywordAnalysis are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- API - extract a list of keywords from a text.☆18Jul 6, 2017Updated 8 years ago
- LinkRun - Data Engineering project done in 3 weeks during the Insight fellowship☆38Apr 2, 2020Updated 6 years ago
- An attempt to use financial news to predict stock market☆16Nov 17, 2018Updated 7 years ago
- Simple multi threaded tool to extract domain related data from commoncrawl.org☆31Jul 17, 2018Updated 7 years ago
- Problem Statement: Given a particular PDF/Text document ,How to extract keywords and arrange in order of their weightage using Python?☆21Jan 17, 2022Updated 4 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Corpus of domain names scraped from Common Crawl and manually annotated to add word boundaries (e.g. "commoncrawl" to "common crawl").☆20Jun 16, 2025Updated 9 months ago
- Extraction code used to create the Dresden Web Table Corpus☆14Feb 25, 2015Updated 11 years ago
- A tiny Python clone of https://archive.org/web/ for your own personal websites.☆15Sep 30, 2020Updated 5 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Jan 16, 2022Updated 4 years ago
- ☆15Aug 15, 2012Updated 13 years ago
- ☆19Dec 19, 2018Updated 7 years ago
- Detect the text orientation on a page with Tesseract OCR☆14Dec 18, 2020Updated 5 years ago
- Community driven landing page generator for open source projects☆15Jan 25, 2016Updated 10 years ago
- Gathers urls from common crawl☆34Nov 9, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Use scrapy with a list of proxies generated from proxynova.com☆39Jan 3, 2013Updated 13 years ago
- Tools to construct and process Common Crawl webgraphs☆107Mar 26, 2026Updated 2 weeks ago
- Wikipedia-based keyword extraction tool in Java☆21May 11, 2015Updated 10 years ago
- Code for the paper: Combining Graph Degeneracy and Submodularity for Unsupervised Extractive Summarization☆17Apr 24, 2020Updated 5 years ago
- Language models are open knowledge graphs ( non official implementation )☆13Jan 17, 2021Updated 5 years ago
- ☆14Sep 22, 2016Updated 9 years ago
- An entity linking prototype, developed using the datasets from the TAC-KBP sub-task.☆28Apr 5, 2017Updated 9 years ago
- Process Common Crawl data with Python and Spark☆453Mar 26, 2026Updated 2 weeks ago
- UBOS administration tools☆16May 30, 2024Updated last year
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Text analysis for automatic bookmarking/keyword extraction☆18Nov 20, 2016Updated 9 years ago
- Joyner Document Format 2.0 (JDF) LaTeX Template☆14Jun 2, 2019Updated 6 years ago
- subdomain list based on Common Crawl data, sorted by popularity☆17Nov 19, 2019Updated 6 years ago
- EmbedRank implemented in Python.☆15Jun 17, 2024Updated last year
- European Parliament website Python scraper☆12Oct 19, 2016Updated 9 years ago
- Convert powerpoint (pptx) files into raw text org or LaTeX files☆15Aug 28, 2018Updated 7 years ago
- ☆11Sep 27, 2024Updated last year
- An unsupervised text summarization and information retrieval library under the hood using natural language processing models☆15Dec 11, 2020Updated 5 years ago
- Extract images from PowerPoint files☆17Dec 1, 2011Updated 14 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Scripts for building a geo-located web corpus using Common Crawl data☆11Jan 18, 2026Updated 2 months ago
- Automated generation of powerpoint slides for fun and profit☆13Oct 18, 2017Updated 8 years ago
- Простая обертка на языке Python для яндексового Tomita Parser'а (больше не нужна, Яндекс открыл исходники)☆17Nov 26, 2015Updated 10 years ago
- Tools for compiling corpora from Common Crawl☆14Nov 24, 2024Updated last year
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Sep 5, 2012Updated 13 years ago
- A simple machine learning package to cluster keywords in higher-level groups.☆17Jul 6, 2022Updated 3 years ago
- Semantic Parser with Execution☆13Dec 8, 2017Updated 8 years ago