Norconex / crawlersLinks
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
☆194Updated last week
Alternatives and similar repositories for crawlers
Users that are interested in crawlers are comparing it to the libraries listed below
Sorting:
- A set of reusable Java components that implement functionality common to any web crawler☆246Updated 3 weeks ago
- Open-source Enterprise Grade Search Engine Software☆509Updated 3 years ago
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆219Updated 2 years ago
- Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.☆417Updated 2 years ago
- Mirror of Apache ManifoldCF☆80Updated last month
- A curated list of Awesome Apache Solr links and resources.☆109Updated 3 years ago
- Apache OpenNLP Sandbox☆44Updated last week
- The next generation of open source search☆93Updated 8 years ago
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆122Updated last year
- Silk is a port of Kibana 4 project.☆70Updated 9 years ago
- High-security graph database☆64Updated 3 years ago
- Carrot2: Text Clustering Algorithms and Applications☆824Updated this week
- Distributed processing framework for search solutions☆82Updated 2 years ago
- Apache Anything To Triples (Any23) is a library, a web service and a command line tool that extracts structured data in RDF format from a…☆98Updated 2 years ago
- Java/JNI bindings to libpostal for for fast international street address parsing/normalization☆126Updated 2 months ago
- Java text categorization system☆57Updated 8 years ago
- Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.☆97Updated 8 years ago
- Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.☆283Updated 7 years ago
- The LAW next generation crawler.☆89Updated 3 years ago
- SKOS Support for Apache Lucene and Solr☆56Updated 4 years ago
- Java library for reading and writing WARC files with a typed API☆50Updated last week
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆73Updated last year
- An ORM / OGM for the TinkerPop graph stack.☆138Updated 3 years ago
- Java access to Neo4J graph databases at multiple levels of abstraction☆85Updated 4 years ago
- Fast in-memory graph structure, powering Gephi☆74Updated 2 weeks ago
- Solr AutoComplete implementation☆59Updated 7 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆27Updated last month
- UADetector is a library to identify over 190 different desktop and mobile browsers and 130 other User-Agents like feed readers, email cli…☆248Updated 3 years ago
- Browser-driven explorer for lucene indexes☆74Updated 3 years ago
- TinkerPop3 Graph Structure Implementation for OrientDB☆94Updated last week