scalingexcellence / scrapy-solrLinks
Scrapy pipeline which allows you to store scrapy items in a solr server.
☆18Updated 9 years ago
Alternatives and similar repositories for scrapy-solr
Users that are interested in scrapy-solr are comparing it to the libraries listed below
Sorting:
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆92Updated last week
- A scrapy pipeline which send items to Elastic Search server☆98Updated 7 years ago
- Scrapes sites. Gets news. Eventually events.☆85Updated 9 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆65Updated 8 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated 2 months ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆73Updated 8 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆98Updated 3 years ago
- NER toolkit for HTML data☆259Updated last year
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- A python implementation of DEPTA☆83Updated 8 years ago
- Extract place names from a URL or text, and add context to those names -- for example distinguishing between a country, region or city.☆62Updated 8 years ago
- Demo of the Newspaper article extraction library.☆29Updated 10 years ago
- An online sentiment analyzer built with Flask and TextBlob☆15Updated 12 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆15Updated 8 years ago
- ☆59Updated 4 years ago
- Bulk Copyscape is a script that utilizes Copyscape's API to by-pass the normal bulk upload queue, allowing you to quickly check websites …☆17Updated 2 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago
- Small demo for a "search-as-you-type" app in AngularJS + Python/Flask + Elasticsearch☆69Updated 8 years ago
- Automatic Item List Extraction☆87Updated 9 years ago
- Python package to detect and return RSS / Atom feeds for a given website. The tool supports major blogging platform including Wordpress, …☆21Updated 3 years ago
- Parse Popolo JSON data and navigate it with Python☆15Updated 5 years ago
- A client interface for Scrapinghub's API☆205Updated 2 weeks ago
- An automated ingestion service for blogs to construct a corpus for NLP research.☆86Updated 7 years ago
- Pure python script that takes user query and summarizes news related to it.☆25Updated 3 years ago
- framework for scraping legislative/government data☆88Updated last year
- Scrape every LinkedIn public profile using Scrapy (Python)☆15Updated 10 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆25Updated 9 years ago