Norconex / crawlersLinks
Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
☆196Updated last week
Alternatives and similar repositories for crawlers
Users that are interested in crawlers are comparing it to the libraries listed below
Sorting:
- A set of reusable Java components that implement functionality common to any web crawler☆252Updated last week
- A scalable, mature and versatile web crawler based on Apache Storm☆961Updated this week
- Open-source Enterprise Grade Search Engine Software☆513Updated 3 years ago
- Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.☆419Updated 2 years ago
- The Apache Gora open source framework provides an in-memory data model and persistence for big data.☆121Updated last year
- The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)☆222Updated 3 years ago
- Java library for parsing semi-structured text files☆65Updated 4 years ago
- Apache OpenNLP Sandbox☆46Updated this week
- Mirror of Apache ManifoldCF☆82Updated 3 weeks ago
- High-security graph database☆64Updated 3 years ago
- Fast in-memory graph structure, powering Gephi☆79Updated this week
- Common web archive utility code.☆61Updated last week
- Silk is a port of Kibana 4 project.☆69Updated 9 years ago
- Browser-driven explorer for lucene indexes☆74Updated 4 years ago
- Solr Redis Extensions☆53Updated 2 years ago
- Carrot2 plugin for ElasticSearch☆295Updated 3 years ago
- The next generation of open source search☆93Updated 8 years ago
- Solr AutoComplete implementation☆59Updated 8 years ago
- ModeShape is a distributed, hierarchical, transactional, and consistent data store with support for queries, full-text search, events, ve…☆222Updated 3 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆72Updated last year
- A curated list of Awesome Apache Solr links and resources.☆110Updated 4 years ago
- The LAW next generation crawler.☆90Updated 4 years ago
- Java/JNI bindings to libpostal for for fast international street address parsing/normalization☆134Updated 7 months ago
- A text tagger based on Lucene / Solr, using FST technology☆177Updated 2 years ago
- Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, what…☆33Updated 6 months ago
- Carrot2: Text Clustering Algorithms and Applications☆847Updated last week
- Storm / Solr Integration☆19Updated 2 years ago
- Java text categorization system☆57Updated 8 years ago
- SKOS Support for Apache Lucene and Solr☆56Updated 4 years ago
- Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.☆98Updated 8 years ago