scalingexcellence / scrapy-solrLinks
Scrapy pipeline which allows you to store scrapy items in a solr server.
☆18Updated 9 years ago
Alternatives and similar repositories for scrapy-solr
Users that are interested in scrapy-solr are comparing it to the libraries listed below
Sorting:
- A scrapy pipeline which send items to Elastic Search server☆98Updated 7 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- ☆59Updated 3 years ago
- Scrapes sites. Gets news. Eventually events.☆87Updated 9 years ago
- Automatically extracts and normalizes an online article or blog post publication date☆117Updated 2 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- Some scrapy and web.py exmaples☆79Updated 8 years ago
- A python implementation of DEPTA☆83Updated 8 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 2 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆57Updated last year
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated 3 weeks ago
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 10 years ago
- List of libraries, tools and APIs for web scraping and data processing.☆13Updated 9 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- Python bindings to the Compact Language Detector☆33Updated 5 years ago
- Demo of the Newspaper article extraction library.☆29Updated 10 years ago
- Record Linkage ToolKit (Find and link entities)☆110Updated 2 years ago
- Named-Entity Recognition extension for Google Refine / OpenRefine☆73Updated 8 years ago
- Scrapes public information off of LinkedIn☆111Updated 9 years ago
- Automatic Item List Extraction☆87Updated 9 years ago
- Bulk Copyscape is a script that utilizes Copyscape's API to by-pass the normal bulk upload queue, allowing you to quickly check websites …☆17Updated 2 years ago
- NER toolkit for HTML data☆259Updated last year
- Pure python script that takes user query and summarizes news related to it.☆25Updated 3 years ago
- Django based application that allows creating, deploying and running Scrapy spiders in a distributed manner☆113Updated 7 years ago
- Paginating the web☆37Updated 11 years ago
- A simple OpenRefine reconciliation service that runs on top of a CSV file☆121Updated 10 years ago
- Detect and classify pagination links☆15Updated 4 years ago
- API - extract a list of keywords from a text.☆18Updated 8 years ago