BernhardWenzel / google-taxonomy-matcher
Matches a category of Google's Taxonomy to product that is described in any kind of text data
☆57Updated 6 years ago
Related projects: ⓘ
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated 7 months ago
- classify a job description (or noisy job title) into a ONET job title☆17Updated 7 years ago
- API - extract a list of keywords from a text.☆18Updated 7 years ago
- Google Cloud Storage connector, pre-processor and model for predicting user search intent based on keywords☆22Updated 4 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 2 years ago
- Scrape all the pages and links of a given domain and write the results to Google Cloud BigQuery.☆35Updated 4 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 2 years ago
- Parsing resumes in a PDF format from linkedIn☆65Updated 7 years ago
- Keywords enrichment by autocompletion (AWS, PM, RDC, CDS, ...), google suggestion scraping Heavy multithreaded semantic corpus crawler S…☆12Updated 9 years ago
- ☆23Updated this week
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- 📊 Repository for the study on 11.8 Million Google Search Results☆23Updated 4 years ago
- Cloud crawler functions for scrapeulous☆44Updated 3 years ago
- Text analysis for automatic bookmarking/keyword extraction☆18Updated 7 years ago
- Bulk Copyscape is a script that utilizes Copyscape's API to by-pass the normal bulk upload queue, allowing you to quickly check websites …☆17Updated last year
- A python library detect and extract listing data from HTML page.☆109Updated 7 years ago
- A collection of tools to (semi-)automatically collect and analyze data from online discussions on Facebook groups and pages.☆33Updated 8 years ago
- A concurrent crawler that minimizes memory use. Output suitable for use with BigQuery.☆20Updated 4 years ago
- Create a visual search engine using tensorflow serving, elasticsearch, vuejs and nginx.☆51Updated 5 years ago
- Node.js application to extract the knowledge represented in Google infoboxes (aka Google Knowlege Graph Panel)☆25Updated 7 years ago
- ☆91Updated this week
- Python program to analyze resume (word document) using a generated set of keywords☆15Updated 8 years ago
- A toolkit for clustering web pages based on various similarity measures.☆32Updated 2 years ago
- Similarity search on Wikipedia using gensim in Python.☆61Updated 5 years ago
- Pythonic wrapper of the Google AdWords API for easy reporting.☆19Updated 6 years ago
- ngram graphs library☆11Updated 2 years ago
- Atlas: A Dataset and Benchmark for E-commerce Clothing Product Categorization☆73Updated last year
- Source real estate prices from the Common Crawl.☆27Updated 5 years ago
- Site Hound (previously THH) is a Domain Discovery Tool☆23Updated 3 years ago
- Console program to get global ranking for a given website or domain☆20Updated last year