Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
☆202Jun 4, 2026Updated last week
Alternatives and similar repositories for crawlers
Users that are interested in crawlers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, what…☆35Apr 27, 2026Updated last month
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆24Sep 25, 2024Updated last year
- Generic library shared between several projects.☆15May 19, 2026Updated 3 weeks ago
- A specific identifying code recognizer implemented by standard cpp using boost and libjpeg libraries.☆10May 15, 2015Updated 11 years ago
- UI Components for Solr☆11Apr 24, 2018Updated 8 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- A scalable, mature and versatile web crawler based on Apache Storm☆979Updated this week
- HMAC authentication for RESTful web applications☆55Dec 5, 2024Updated last year
- Faceted Browsing over Wikidata triples☆18Jun 16, 2018Updated 7 years ago
- In this very simple Docker Swarm Demo we create Docker hosts with Docker Machine and install after this a small Elasticsearch cluster.☆12Jul 31, 2016Updated 9 years ago
- 面向单机与分布式 OLTP/OLAP 场景的可暂停的渐进式 SQL 引擎 (只用于研究)☆12May 11, 2023Updated 3 years ago
- A 5 node zookeeper ensemble that runs in Docker☆17Dec 2, 2014Updated 11 years ago
- Neo4J database profiling utility☆41Aug 28, 2018Updated 7 years ago
- Spring Boot Web with Hessian☆11Jul 2, 2014Updated 11 years ago
- a Haskell clone for the JVM☆12Jul 9, 2015Updated 10 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- ☆17May 25, 2015Updated 11 years ago
- The next generation of open source search☆94May 25, 2017Updated 9 years ago
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- A set of Java utilities that we could not find in Guava or Apache Commons...or we just felt like having our own version.☆23Updated this week
- Implicit relation extractor using a natural language model.☆24May 25, 2018Updated 8 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆258Jun 3, 2026Updated last week
- Text Simplification System and Dataset☆15Jul 19, 2017Updated 8 years ago
- SWIM Protocol in Java☆10Apr 1, 2020Updated 6 years ago
- Named Entity Recognition and Pattern Mining☆22Mar 10, 2020Updated 6 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Spring Cloud Zuul routes health indicator☆11Dec 25, 2015Updated 10 years ago
- Spring Cloud Zuul Trie tree route matcher☆14Feb 1, 2016Updated 10 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34May 3, 2023Updated 3 years ago
- Freedom for Media in Java - Git Mirror☆21Apr 16, 2026Updated last month
- Classifier for predicting user interests based on Twitter profile and using Python library scikit-learn.☆31Jun 7, 2013Updated 13 years ago
- NER toolkit for HTML data☆259May 3, 2024Updated 2 years ago
- Exploration of spark streaming based on the BigData.be project 2☆15Sep 2, 2013Updated 12 years ago
- A SQL-esque scripting language for spatial processing and ETL☆11Mar 4, 2019Updated 7 years ago
- fetchIO is a simple, configurable, fault-tolerant web crawler written in Haskell☆23Feb 16, 2017Updated 9 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Gecko crawler supports distributed by redis☆24Mar 11, 2018Updated 8 years ago
- Web/FileSystem Crawler Library☆37May 16, 2026Updated 3 weeks ago
- The Java Wikipedia API (Bliki engine) is a parser library for converting Wikipedia wikitext notation to HTML.☆16Jan 28, 2026Updated 4 months ago
- Realtime Analytics☆41Mar 27, 2012Updated 14 years ago
- GUI program to generate windows and SQL audit files for nessus☆14Jun 23, 2017Updated 8 years ago
- Android UI element for displaying and editing an opening hours value☆10Apr 20, 2026Updated last month
- Argument and options parser for java☆18Nov 7, 2018Updated 7 years ago