TeamHG-Memex / fortiaLinks

[UNMAINTAINED] Firefox addon for Scrapely

☆5

Alternatives and similar repositories for fortia

Users that are interested in fortia are comparing it to the libraries listed below

Sorting:

nasa-jpl-memex / topic_space
Topic modeling web application
☆41Updated 9 years ago
TeamHG-Memex / scrapy-dockerhub
[UNMAINTAINED] Deploy, run and monitor your Scrapy spiders.
☆11Updated 10 years ago
VIDA-NYU / domain_discovery_tool_deprecated
Seed acquisition tool to bootstrap focused crawlers
☆23Updated 8 years ago
mitll / MITIE
MITIE: library and tools for information extraction
☆29Updated 10 years ago
ContinuumIO / topik
A Topic Modeling toolbox
☆92Updated 9 years ago
TeamHG-Memex / MaybeDont
A component that tries to avoid downloading duplicate content
☆27Updated 7 years ago
seomoz / mltk
mltk - Moz Language Tool Kit
☆12Updated 10 years ago
TeamHG-Memex / url-summary
Show summary of a large number of URLs in a Jupyter Notebook
☆17Updated 4 years ago
scrapinghub / page_finder
Find which links on a web page are pagination links
☆29Updated 8 years ago
scrapy-plugins / scrapy-streaming
☆18Updated 8 years ago
TeamHG-Memex / extract-html-diff
extract difference between two html pages
☆32Updated 7 years ago
daeilkim / refinery
Refinery - A locally deployable open-source web platform for analysis of large document collections
☆101Updated 8 years ago
rspeer / text-as-data
A PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.
☆50Updated 10 years ago
TeamHG-Memex / sitehound-frontend
Site Hound (previously THH) is a Domain Discovery Tool
☆23Updated 4 years ago
turian / pytextpreprocess
Preprocess text for NLP (tokenizing, lowercasing, stemming, sentence splitting, etc.)
☆29Updated 14 years ago
nasa-jpl-memex / memex-gate
General Architecture for Text Engineering
☆50Updated 9 years ago
trec-kba / streamcorpus
common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text
☆35Updated 8 years ago
bigsnarfdude / machineLearning
POC IDS anomaly detection engine built with iPython notebook, matplotlib, pandas, numpy, scikit-learn, d3.js, hyperloglog implementation,…
☆79Updated 10 years ago
turian / topia.termextract
Updates to Zope's keyphrase extractor (forked from 1.1.0)
☆67Updated 8 years ago
giantoak / unicorn
Visualization and summarization of a collection of documents.
☆20Updated 3 years ago
Alir3z4 / python-sanitize
Bringing sanity to world of messed-up data
☆66Updated 10 years ago
modocache / github-recommendation-engine
Discover repositories you should be following on Github.
☆31Updated 13 years ago
chrismattmann / nutch-python
Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit
☆39Updated 9 years ago
scrapy-plugins / scrapy-monkeylearn
A Scrapy pipeline to categorize items using MonkeyLearn
☆37Updated 8 years ago
nasa-jpl-memex / memex-explorer
Viewers for statistics and dashboarding of Domain Search Engine data
☆124Updated 9 years ago
OlivierBlanvillain / crawler
Blog crawler for the blogforever project.
☆22Updated 11 years ago
rohithpr / py-web-search
A Python module to fetch and parse results from different search engines.
☆77Updated 6 years ago
scrapinghub / exporters
Exporters is an extensible export pipeline library that supports filter, transform and several sources and destinations
☆40Updated last year
svenkreiss / databench
Data analysis tool.
☆85Updated 2 years ago
sandinmyjoints / visularity
Realtime semantic similarity visualization with gensim, d3.js, and hookbox
☆40Updated 11 years ago