b-cube/nutch-crawler

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/b-cube/nutch-crawler)

b-cube / nutch-crawler

Apache Nutch fork tunned for web services and data discovery.

☆10

Alternatives and similar repositories for nutch-crawler

Users that are interested in nutch-crawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

eleflow / nutch-aws
View on GitHub
☆25Apr 6, 2015Updated 11 years ago
slalombuild / fusion
View on GitHub
🧬 Generate secure by default cloud infrastructure configuration with Go and Terraform.
☆12Jan 23, 2024Updated 2 years ago
meabed / nutch-cassandra-docker
View on GitHub
Nutch with Cassandra and Elasticsearch on Docker
☆17Oct 26, 2021Updated 4 years ago
whitehead / plaac
View on GitHub
Prion-Like Amino Acid Composition
☆18Dec 15, 2025Updated 7 months ago
rbd80 / Amazon_Linux_2
View on GitHub
Harden of the AMS Linux 2
☆11May 14, 2018Updated 8 years ago
End-to-end encrypted email - Proton Mail • Ad
Special offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
cazuzabarberino / fake-trello
View on GitHub
A simple task manager for pesonal use inspired by trello interface.
☆15Sep 11, 2020Updated 5 years ago
gabrielg / mail_to_hip_chat
View on GitHub
Funnels email into HipChat
☆18Jan 27, 2015Updated 11 years ago
momer / nutch-selenium
View on GitHub
☆28Jun 9, 2016Updated 10 years ago
jddeal / python-cmr
View on GitHub
A python wrapper to the NASA Common Metadata Repository API
☆20Oct 14, 2021Updated 4 years ago
amazon-archives / streaming-analytics-pipeline
View on GitHub
WARNING- This package is no longer supported and will be replaced in the near future. A solution that enables customers to easily create …
☆16Mar 28, 2018Updated 8 years ago
ourresearch / total-impact-webapp
View on GitHub
The web frontend for http://impactstory.org. Calls the backend api code in total-impact-core github repo.
☆25Oct 21, 2016Updated 9 years ago
Caleydo / org.caleydo.vis.lineup.demos
View on GitHub
LineUp Demos
☆25Dec 15, 2016Updated 9 years ago
trec-kba / streamcorpus
View on GitHub
common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text
☆35Sep 30, 2016Updated 9 years ago
awslabs / aws-akka-firehose
View on GitHub
An Akka actor that writes JSON data into Amazon Kinesis Firehose.
☆14Jan 14, 2026Updated 6 months ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
Ecohen4 / data-viz
View on GitHub
Teaching data visualization at Columbia University.
☆10Oct 2, 2015Updated 10 years ago
jbarcia / PCI-Audit-Script
View on GitHub
☆23Dec 3, 2020Updated 5 years ago
alexlancaster / pypop
View on GitHub
PyPop: Python for Population Genomics
☆25Updated this week
thedataincubator / excel-to-python
View on GitHub
Examples of how Python can speed up tasks that are cumbersome in Excel
☆13Oct 5, 2016Updated 9 years ago
joshua-decoder / thrax
View on GitHub
Hadoop-based tool for extraction of large scale synchronous grammars for paraphrasing and machine translation
☆15Dec 2, 2016Updated 9 years ago
pcodding / hadoop_ctakes
View on GitHub
Hadoop integration code for working with with Apache cTAKES
☆10Feb 11, 2014Updated 12 years ago
pipauwel / ifcParserLib
View on GitHub
ifcParserLib is a set of reusable Java components that implement functionality for IFC file parsing.
☆10Oct 14, 2020Updated 5 years ago
amngibson / metasploit-runner
View on GitHub
This is a gem that provides the ability to create a workspace, import scan data from nexpose, and perform a webscan, a web audit, and per…
☆10Dec 13, 2017Updated 8 years ago
tagbase / tagbase
View on GitHub
A data management system for electronic tags on marine animals
☆13Mar 31, 2025Updated last year
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
OPENDAP / bes
View on GitHub
The BES framework, which forms the basis for the Hyrax server
☆16Updated this week
thedataincubator / pydata2016
View on GitHub
A couple projects using scikit-learn illustrating project decision making.
☆15Oct 8, 2016Updated 9 years ago
thedataincubator / ds30_3
View on GitHub
Data Science in 30 Minutes #3 - Holt-Winters and exponential smoothing
☆17Jun 20, 2016Updated 10 years ago
jmctee / Cassandra-Client-Tutorial
View on GitHub
Example of using Java to access Cassandra with Hector and Astyanax libraries
☆16Sep 30, 2012Updated 13 years ago
fgd-haha / shopping
View on GitHub
☆13Apr 11, 2022Updated 4 years ago
qpleple / online-lda-vb
View on GitHub
[Experiment] Online Latent Dirichlet Allocation implementation in python
☆16May 10, 2013Updated 13 years ago
OpenScienceMOOC / Module-1-Open-Principles
View on GitHub
Module 1: Open Principles
☆37Nov 14, 2019Updated 6 years ago
ipfs-inactive / window.ipfs-fallback
View on GitHub
[DEPRECATED] Use ipfs-provider instead:
☆11May 13, 2020Updated 6 years ago
apache / incubator-ponymail-site
View on GitHub
Apache Pony Mail
☆14Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
chrismattmann / trec-dd-polar
View on GitHub
A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.
☆13Sep 8, 2017Updated 8 years ago
unifio / packer-provisioner-serverspec
View on GitHub
Packer Serverspec remote provisioner
☆32Feb 25, 2023Updated 3 years ago
mdredze / carmen
View on GitHub
Geolocation for Twitter.
☆44Dec 4, 2014Updated 11 years ago
nasa / cmr-metadata-review
View on GitHub
The CMR Metadata Review tool is used to curate NASA EOSDIS collection and granule level metadata in CMR for correctness, completeness and…
☆26Sep 4, 2025Updated 10 months ago
PREMIS-OWL-Revision-Team / premis-owl
View on GitHub
Repository for revision of PREMIS OWL ontology group
☆13May 12, 2022Updated 4 years ago
Reading-eScience-Centre / pycovjson
View on GitHub
Create CovJSON files from common scientific data formats
☆14Apr 24, 2018Updated 8 years ago
flowchartsman / newsbot
View on GitHub
A twitter streaming, website-scraping, websocket-transporting news delivery webapp written in Go
☆10Jul 17, 2015Updated 11 years ago