psolbach / metadocLinks

Aviation grade news article metadata extraction

☆36

Alternatives and similar repositories for metadoc

Users that are interested in metadoc are comparing it to the libraries listed below

Sorting:

scrapinghub / mdr
A python library detect and extract listing data from HTML page.
☆108Updated 8 years ago
istresearch / traptor
Traptor -- A distributed Twitter feed
☆26Updated 3 years ago
Webhose / article-date-extractor
Automatically extracts and normalizes an online article or blog post publication date
☆117Updated 2 years ago
CI-Research / KeywordAnalysis
Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends
☆57Updated last year
Parsely / serpextract
Easy extraction of keywords and engines from search engine results pages (SERPs).
☆91Updated 3 years ago
tasdikrahman / spammy
Spam filtering made easy for you
☆144Updated 6 years ago
0b01 / bodine
It finds best synonyms from Google Books when you press a hotkey
☆30Updated 10 years ago
scrapinghub / webpager
Paginating the web
☆37Updated 11 years ago
adamfabish / Reduction
Reduction is a python script which automatically summarizes a text by extracting the sentences which are deemed to be most important.
☆54Updated 10 years ago
tonywangcn / scaleable-crawler-with-docker-cluster
a scaleable and efficient crawelr with docker cluster , crawl million pages in 2 hours with a single machine
☆97Updated last year
cocrawler / cocrawler
CoCrawler is a versatile web crawler built using modern tools and concurrency.
☆189Updated 3 years ago
bexp / textai
REST API for Text Summarization and Keywords Extraction
☆16Updated 2 years ago
TeamHG-Memex / sitehound-frontend
Site Hound (previously THH) is a Domain Discovery Tool
☆23Updated 4 years ago
TeamHG-Memex / Formasaurus
Formasaurus tells you the type of an HTML form and its fields using machine learning
☆119Updated last year
ayoungprogrammer / Lango
Language Lego
☆141Updated 5 years ago
trivio / common_crawl_index
Index URLs in Common Crawl
☆195Updated 8 years ago
scrapinghub / aduana
Frontera backend to guide a crawl using PageRank, HITS or other ranking algorithms based on the link structure of the web graph, even whe…
☆55Updated last year
TeamHG-Memex / deep-deep
Adaptive crawler which uses Reinforcement Learning methods
☆168Updated 7 years ago
mikelynn2 / sentimentAPI
A fast python scikit-learn text sentiment API server.
☆89Updated 9 years ago
xtannier / WebAnnotator
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Updated 3 years ago
christabor / namebot
A company/project name generator for Python. Uses NLTK and diverse techniques derived from existing corporate etymologies and naming agen…
☆50Updated 8 years ago
pcbje / gransk
Document processing for investigations
☆249Updated 8 years ago
pydepta / pydepta
A python implementation of DEPTA
☆83Updated 8 years ago
TeamHG-Memex / scrapy-dockerhub
[UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.
☆11Updated 10 years ago
rmax / databrewer
The missing datasets manager. Like hombrew but for datasets. CLI-tool for search and discover datasets!
☆41Updated 8 years ago
kootenpv / sky
next generation web crawling using machine intelligence
☆331Updated 2 years ago
wordnik / serapis
Serapis is a sentence identifier and modeling pipeline / built for Wordnik
☆24Updated 9 years ago
spro / nalgene
Natural language generation language
☆55Updated 6 years ago
scrapinghub / webstruct
NER toolkit for HTML data
☆259Updated last year
scrapinghub / aile
Automatic Item List Extraction
☆87Updated 9 years ago