Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or filesystem to various data repositories such as search engines.
☆200Apr 3, 2026Updated 2 weeks ago
Alternatives and similar repositories for crawlers
Users that are interested in crawlers are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Norconex Importer is a Java library and command-line application meant to "parse" and "extract" content out of a file as plain text, what…☆35Feb 21, 2026Updated last month
- UI Components for Solr☆11Apr 24, 2018Updated 7 years ago
- A scalable, mature and versatile web crawler based on Apache Storm☆974Updated this week
- HMAC authentication for RESTful web applications☆54Dec 5, 2024Updated last year
- Sitecore Experience Accelerator Showcase Site☆17Jan 28, 2021Updated 5 years ago
- Simple, predictable pricing with DigitalOcean hosting • AdAlways know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
- Test resources support☆11Apr 12, 2026Updated last week
- In this very simple Docker Swarm Demo we create Docker hosts with Docker Machine and install after this a small Elasticsearch cluster.☆12Jul 31, 2016Updated 9 years ago
- A 5 node zookeeper ensemble that runs in Docker☆17Dec 2, 2014Updated 11 years ago
- Spring Boot Web with Hessian☆11Jul 2, 2014Updated 11 years ago
- Linked Data explorer and SPARQL endpoint☆23Dec 15, 2021Updated 4 years ago
- Neural Solr = Solr 9 + Mighty Inference + Node☆18Jun 9, 2022Updated 3 years ago
- ☆17May 25, 2015Updated 10 years ago
- The next generation of open source search☆94May 25, 2017Updated 8 years ago
- A set of Java utilities that we could not find in Guava or Apache Commons...or we just felt like having our own version.☆23Apr 11, 2026Updated last week
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- ZIO-inspired APIs for Kyo☆12Apr 18, 2024Updated 2 years ago
- Elasticsearch proxy for Quepid.☆13Oct 30, 2025Updated 5 months ago
- A Consul Client for Java☆13Apr 2, 2026Updated 2 weeks ago
- A set of reusable Java components that implement functionality common to any web crawler☆255Feb 26, 2026Updated last month
- SWIM Protocol in Java☆10Apr 1, 2020Updated 6 years ago
- Code for the paper Faster Phrase-Based Decoding by Refining Feature State☆14Jan 9, 2023Updated 3 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Oct 18, 2024Updated last year
- ☆12Feb 23, 2023Updated 3 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34May 3, 2023Updated 2 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- CuVS integration for Lucene☆38Jun 17, 2025Updated 10 months ago
- Teiid Designer is a visual tool that enables rapid, model-driven definition, integration, management and testing of data services without…☆35Dec 13, 2022Updated 3 years ago
- Functional programming ideas for Groovy☆15Jul 17, 2015Updated 10 years ago
- A SQL-esque scripting language for spatial processing and ETL☆11Mar 4, 2019Updated 7 years ago
- Machine Learning with Elastic Stack - Second Edition, published by Packt☆18Jun 3, 2021Updated 4 years ago
- Hide menu bar icons for third-party apps on macOS☆14Dec 23, 2016Updated 9 years ago
- Gecko crawler supports distributed by redis☆24Mar 11, 2018Updated 8 years ago
- The Java Wikipedia API (Bliki engine) is a parser library for converting Wikipedia wikitext notation to HTML.☆16Jan 28, 2026Updated 2 months ago
- Web/FileSystem Crawler Library☆37Apr 12, 2026Updated last week
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- An experimental multi-tenant distributed system platform☆59Nov 12, 2024Updated last year
- Examples for osm4j☆11Jul 22, 2023Updated 2 years ago
- Use VBB interactively, using a map.☆10Jan 11, 2022Updated 4 years ago
- Realtime Analytics☆41Mar 27, 2012Updated 14 years ago
- GUI program to generate windows and SQL audit files for nessus☆14Jun 23, 2017Updated 8 years ago
- Cyberinfrastructure Shell (CIShell) is an open source, community-driven framework/application for the integration and utilization of data…☆31Nov 28, 2018Updated 7 years ago
- App Metrics Extensions for Elasticsearch reporting☆21Nov 4, 2019Updated 6 years ago