Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
☆200May 12, 2026Updated last week
Alternatives and similar repositories for crawlers
Users that are interested in crawlers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Norconex Committer for Elasticsearch.☆11Apr 27, 2026Updated 3 weeks ago
- Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, what…☆35Apr 27, 2026Updated 3 weeks ago
- FoGFaaS: Add serverless computing (faas) to ifogsim☆22Mar 30, 2025Updated last year
- A scalable, mature and versatile web crawler based on Apache Storm☆976Updated this week
- Enterprise Open Source IM Solution☆13Aug 13, 2019Updated 6 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Concentrated on solving java components conflict problem!☆25Dec 9, 2022Updated 3 years ago
- Open-source Enterprise Grade Search Engine Software☆516Sep 3, 2022Updated 3 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Aug 5, 2016Updated 9 years ago
- Flink image for Kubernetes that fixes Jobmanage connection issue☆26Jul 31, 2018Updated 7 years ago
- Source code of crawlpod☆16Nov 20, 2015Updated 10 years ago
- The next generation of open source search☆94May 25, 2017Updated 8 years ago
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- Extensive API/App that visualizes the social capital ranking of celebrities based on social media/news☆14Feb 12, 2016Updated 10 years ago
- A set of Java utilities that we could not find in Guava or Apache Commons...or we just felt like having our own version.☆23May 7, 2026Updated last week
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Implicit relation extractor using a natural language model.☆24May 25, 2018Updated 7 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆256Apr 27, 2026Updated 3 weeks ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Oct 18, 2024Updated last year
- Named Entity Recognition and Pattern Mining☆22Mar 10, 2020Updated 6 years ago
- Spring Cloud Zuul routes health indicator☆11Dec 25, 2015Updated 10 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34May 3, 2023Updated 3 years ago
- A maven plugin to create an apt repository.☆20Jun 27, 2024Updated last year
- Classifier for predicting user interests based on Twitter profile and using Python library scikit-learn.☆31Jun 7, 2013Updated 12 years ago
- A showcase of UIkit websites and themes☆26Jul 10, 2020Updated 5 years ago
- Serverless GPU API endpoints on Runpod - Get Bonus Credits • AdSkip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
- 分布式脚手架框架(总结整理)☆15Aug 27, 2015Updated 10 years ago
- Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.☆423Mar 30, 2023Updated 3 years ago
- Exploration of spark streaming based on the BigData.be project 2☆15Sep 2, 2013Updated 12 years ago
- Configurar EdgeRouter para ver Movistar TV☆12Jul 28, 2020Updated 5 years ago
- TeGere! = Behave! — a Gherkin library for Clojure☆13Oct 12, 2023Updated 2 years ago
- Machine Learning with Elastic Stack - Second Edition, published by Packt☆18Jun 3, 2021Updated 4 years ago
- fetchIO is a simple, configurable, fault-tolerant web crawler written in Haskell☆23Feb 16, 2017Updated 9 years ago
- Gecko crawler supports distributed by redis☆24Mar 11, 2018Updated 8 years ago
- Web/FileSystem Crawler Library☆37May 5, 2026Updated 2 weeks ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Argument and options parser for java☆18Nov 7, 2018Updated 7 years ago
- Teiid Designer is a visual tool that enables rapid, model-driven definition, integration, management and testing of data services without…☆35Dec 13, 2022Updated 3 years ago
- Run JavaScript from Java in a safe sandbox.☆63Updated this week
- Cyberinfrastructure Shell (CIShell) is an open source, community-driven framework/application for the integration and utilization of data…☆31Nov 28, 2018Updated 7 years ago
- An experimental multi-tenant distributed system platform☆59Nov 12, 2024Updated last year
- A lightweight reactive RPC-like system built on Akka IO☆45Apr 23, 2015Updated 11 years ago
- 针对复杂业务逻辑的Java实现系统,抽象出一套编程框架,借鉴领域模型的设计方法,使得开发体验更加环保、更加友好,大大提高代码的后期可维护性☆24Aug 3, 2014Updated 11 years ago