scalingexcellence / scrapy-solrLinks

Scrapy pipeline which allows you to store scrapy items in a solr server.

☆18

Alternatives and similar repositories for scrapy-solr

Users that are interested in scrapy-solr are comparing it to the libraries listed below

Sorting:

scrapinghub / mdr
A python library detect and extract listing data from HTML page.
☆108Updated 8 years ago
Parsely / serpextract
Easy extraction of keywords and engines from search engine results pages (SERPs).
☆92Updated last week
julien-duponchelle / scrapy-elasticsearch
A scrapy pipeline which send items to Elastic Search server
☆98Updated 7 years ago
openeventdata / scraper
Scrapes sites. Gets news. Eventually events.
☆85Updated 9 years ago
opensemanticsearch / solr-ontology-tagger
Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri
☆47Updated 3 years ago
NAMD / pypln.backend
Pipeline for distributed Natural Language Processing, made in Python
☆65Updated 8 years ago
povilasb / scrapy-html-storage
Scrapy downloader middleware that stores response HTMLs to disk.
☆18Updated 2 months ago
RubenVerborgh / Refine-NER-Extension
Named-Entity Recognition extension for Google Refine / OpenRefine
☆73Updated 8 years ago
opensemanticsearch / open-semantic-search-apps
Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…
☆98Updated 3 years ago
scrapinghub / webstruct
NER toolkit for HTML data
☆259Updated last year
Webhose / article-date-extractor
Automatically extracts and normalizes an online article or blog post publication date
☆117Updated 2 years ago
scrapinghub / page_finder
Find which links on a web page are pagination links
☆29Updated 8 years ago
pydepta / pydepta
A python implementation of DEPTA
☆83Updated 8 years ago
Corollarium / geograpy2
Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.
☆62Updated 8 years ago
codelucas / newspaper-demo
Demo of the Newspaper article extraction library.
☆29Updated 10 years ago
sloria / textfeel-web
An online sentiment analyzer built with Flask and TextBlob
☆15Updated 12 years ago
iproduct-database / vpm-filter-spark
Virtual patent marking crawler at iproduct.epfl.ch
☆15Updated 8 years ago
vu3jej / scrapy-corenlp
☆59Updated 4 years ago
justinmichaelvieira / BulkCopyscape
Bulk Copyscape is a script that utilizes Copyscape's API to by-pass the normal bulk upload queue, allowing you to quickly check websites …
☆17Updated 2 years ago
xtannier / WebAnnotator
WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…
☆48Updated 3 years ago
bonzanini / CheerMeApp-demo
Small demo for a "search-as-you-type" app in AngularJS + Python/Flask + Elasticsearch
☆69Updated 8 years ago
scrapinghub / aile
Automatic Item List Extraction
☆87Updated 9 years ago
kaflesudip / grabfeed
Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …
☆21Updated 3 years ago
everypolitician / everypolitician-popolo-python
Parse Popolo JSON data and navigate it with Python
☆15Updated 5 years ago
scrapinghub / python-scrapinghub
A client interface for Scrapinghub's API
☆205Updated 2 weeks ago
DistrictDataLabs / baleen
An automated ingestion service for blogs to construct a corpus for NLP research.
☆86Updated 7 years ago
abhinavgupta / Extract-News-Summary
Pure python script that takes user query and summarizes news related to it.
☆25Updated 3 years ago
opencivicdata / pupa
framework for scraping legislative/government data
☆88Updated last year
kelaraj / linkedin_scrapy
Scrape every LinkedIn public profile using Scrapy (Python)
☆15Updated 10 years ago
opendata / Legal-Synonyms
A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]
☆25Updated 9 years ago