scalingexcellence / scrapy-solr
Scrapy pipeline which allows you to store scrapy items in a solr server.
☆19Updated 9 years ago
Alternatives and similar repositories for scrapy-solr:
Users that are interested in scrapy-solr are comparing it to the libraries listed below
- Small set of utilities to simplify writing Scrapy spiders.☆49Updated 9 years ago
- Easy extraction of keywords and engines from search engine results pages (SERPs).☆90Updated 3 years ago
- A python library detect and extract listing data from HTML page.☆108Updated 8 years ago
- Find which links on a web page are pagination links☆29Updated 8 years ago
- Scrapy downloader middleware that stores response HTMLs to disk.☆18Updated 11 months ago
- Word analysis, by domain, on the Common Crawl data set for the purpose of finding industry trends☆56Updated last year
- ☆59Updated 3 years ago
- Scrapes sites. Gets news. Eventually events.☆86Updated 9 years ago
- Paginating the web☆37Updated 11 years ago
- Python client library for controlling Google Refine☆40Updated 11 years ago
- Search engine base (crawler, indexer and parser) using Python, Celery, RabbitMQ, CouchDB and Whoosh.☆11Updated last year
- Python binding for gumbo-parser using Cython☆14Updated 8 years ago
- This is a REST Server endpoint built using Flask and Python.☆24Updated 2 years ago
- Aho-Corasick string replacement utility☆24Updated 5 years ago
- extract difference between two html pages☆32Updated 6 years ago
- clone of https://code.google.com/p/splitta/ so it can be a git submodule☆34Updated 11 years ago
- ArchiveKit manages data and documents during ETL processes, either on a local file system or on S3.☆15Updated 10 years ago
- Detect and classify pagination links☆15Updated 4 years ago
- A python implementation of DEPTA☆83Updated 8 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Pure python script that takes user query and summarizes news related to it.☆25Updated 2 years ago
- Keyword query search engine on semantic store/linked data web☆9Updated 9 years ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆97Updated 2 years ago
- Restrict crawl and scraping scope using matchers.☆25Updated 8 years ago
- code and slides for my PyGotham 2016 talk, "Higher-level Natural Language Processing with textacy"☆15Updated 8 years ago
- Pipeline for distributed Natural Language Processing, made in Python☆64Updated 8 years ago
- An online sentiment analyzer built with Flask and TextBlob☆15Updated 11 years ago
- Simple type converters: make ints, floats, bools and dates from your strings!☆11Updated 8 years ago
- A helper to create web scrapers using scrapy selector in a Model based structure☆57Updated 2 years ago
- WebAnnotator is a tool for annotating Web pages. WebAnnotator is implemented as a Firefox extension (https://addons.mozilla.org/en-US/fi…☆48Updated 3 years ago