Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
☆199Mar 12, 2026Updated 2 weeks ago
Alternatives and similar repositories for crawlers
Users that are interested in crawlers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Implementation of Norconex Committer for Elasticsearch.☆11Jan 4, 2022Updated 4 years ago
- Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, what…☆34Feb 21, 2026Updated last month
- Tools to custom your domain resolved rules. Used BlackHole as DNS server.☆18Jun 22, 2013Updated 12 years ago
- Generic library shared between several projects.☆14Feb 23, 2026Updated last month
- A scalable, mature and versatile web crawler based on Apache Storm☆972Updated this week
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Faceted Browsing over Wikidata triples☆18Jun 16, 2018Updated 7 years ago
- Attivio Search UI Toolkit (SUIT) is a library for creating search clients for searching the Attivio platform.☆12Oct 6, 2022Updated 3 years ago
- Open-source Enterprise Grade Search Engine Software☆513Sep 3, 2022Updated 3 years ago
- In this very simple Docker Swarm Demo we create Docker hosts with Docker Machine and install after this a small Elasticsearch cluster.☆12Jul 31, 2016Updated 9 years ago
- JNumberTools is an open-source Java library for solving complex problems in combinatorics and number theory. Whether you're a researcher,…☆14Mar 23, 2026Updated last week
- A 5 node zookeeper ensemble that runs in Docker☆17Dec 2, 2014Updated 11 years ago
- Spring Boot Web with Hessian☆11Jul 2, 2014Updated 11 years ago
- Fureteur is a simple, configurable, fault-tolerant web crawler written is Scala☆28Oct 14, 2014Updated 11 years ago
- Linked Data explorer and SPARQL endpoint☆23Dec 15, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- Flink image for Kubernetes that fixes Jobmanage connection issue☆26Jul 31, 2018Updated 7 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆18Jun 9, 2022Updated 3 years ago
- ☆17May 25, 2015Updated 10 years ago
- Source code of crawlpod☆16Nov 20, 2015Updated 10 years ago
- The next generation of open source search☆94May 25, 2017Updated 8 years ago
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- A set of Java utilities that we could not find in Guava or Apache Commons...or we just felt like having our own version.☆22Mar 18, 2026Updated last week
- spark sql online editor☆13Dec 11, 2022Updated 3 years ago
- A set of reusable Java components that implement functionality common to any web crawler☆254Feb 26, 2026Updated last month
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- Text Simplification System and Dataset☆15Jul 19, 2017Updated 8 years ago
- SWIM Protocol in Java☆10Apr 1, 2020Updated 5 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Oct 18, 2024Updated last year
- Spring Cloud Zuul routes health indicator☆11Dec 25, 2015Updated 10 years ago
- Spring Cloud Zuul Trie tree route matcher☆14Feb 1, 2016Updated 10 years ago
- NER toolkit for HTML data☆259May 3, 2024Updated last year
- Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.☆420Mar 30, 2023Updated 2 years ago
- Configurar EdgeRouter para ver Movistar TV☆12Jul 28, 2020Updated 5 years ago
- Machine Learning with Elastic Stack - Second Edition, published by Packt☆18Jun 3, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- A random generator of Lua programs☆12Mar 5, 2026Updated 3 weeks ago
- PredictionIO Engine integrated with Sparkling Water. Open Source project Spring 2015 @CMU.☆12Oct 5, 2015Updated 10 years ago
- fetchIO is a simple, configurable, fault-tolerant web crawler written in Haskell☆23Feb 16, 2017Updated 9 years ago
- Web/FileSystem Crawler Library☆36Mar 16, 2026Updated last week
- Examples for osm4j☆11Jul 22, 2023Updated 2 years ago
- GUI program to generate windows and SQL audit files for nessus☆14Jun 23, 2017Updated 8 years ago
- Use VBB interactively, using a map.☆10Jan 11, 2022Updated 4 years ago