Common Crawl fork of Apache Nutch
☆40Feb 26, 2026Updated this week
Alternatives and similar repositories for nutch
Users that are interested in nutch are comparing it to the libraries listed below
Sorting:
- A neural dependency parser that does its best☆16Dec 12, 2025Updated 2 months ago
- A robust web archive analytics toolkit☆132Oct 15, 2025Updated 4 months ago
- Respect generative AI opt-outs in your ML training pipeline.☆39Oct 9, 2024Updated last year
- Romanian Word Embeddings. Here you can find pre-trained corpora of word embeddings. Current methods: CBOW, Skip-Gram, Fast-Text (from Gen…☆13Oct 6, 2025Updated 4 months ago
- ☆10Jul 23, 2015Updated 10 years ago
- Security diagnostic quick start guide. Identifying the best measures and establishing specific security procedures for your organization.☆11May 29, 2019Updated 6 years ago
- Console Server and Logger for IPMI and KVM based on tmux☆16Jun 3, 2024Updated last year
- Upload a document image or PDF, or provide a URL, to convert it into a structured format using SmolDocling.☆16Mar 31, 2025Updated 11 months ago
- CORE-ReID: Comprehensive Optimization and Refinement through Ensemble fusion in Domain Adaptation for person re-identification☆15May 7, 2025Updated 9 months ago
- NLRB data scraper by LexPredict☆12Dec 8, 2022Updated 3 years ago
- Collection of open source hypervolume codes that have been standardized to work with the MOEA Framework.☆13Apr 6, 2024Updated last year
- Code for 2021 TACL paper on community-specific language☆13Dec 8, 2022Updated 3 years ago
- DistributeCrawler的Maven版☆10Jun 20, 2022Updated 3 years ago
- Using OpenVINO to speed up inference of PaddleOCR-VL model☆25Updated this week
- ☆10Feb 26, 2019Updated 7 years ago
- Discover, analyze and present data from the web and mobile in meaninful ways☆83Jul 16, 2013Updated 12 years ago
- Machine Learning solution for Kaggle.com's "Partly Sunny with a Chance of Hashtags"☆27Dec 6, 2013Updated 12 years ago
- parquet dedupe estimator☆25Feb 20, 2026Updated last week
- Chef cookbook for the http://druid.io/☆10Apr 25, 2016Updated 9 years ago
- Node.js module for the aREST framework☆11Sep 25, 2018Updated 7 years ago
- Redux and RactiveJS example☆10Mar 8, 2016Updated 9 years ago
- npm module for flickr api☆14Jun 24, 2015Updated 10 years ago
- This repo outlines a method for differentiating between anomalies and expected outliers using the Microsoft Anomaly Detection API and Bin…☆10Jun 11, 2017Updated 8 years ago
- A game engine made in Java using libgdx (Currently in alpha state, and probably will remain that way)☆16Jan 4, 2012Updated 14 years ago
- ☆10Nov 26, 2024Updated last year
- Network Optix Meta Platform Docker source code and instructions used for launching Nx Meta and Powered-by-Nx products in Docker container…☆16Dec 5, 2025Updated 2 months ago
- ☆10Mar 9, 2019Updated 6 years ago
- Modern GIS Web Client for JavaScript, based on MapboxGL-JS, OpenLayers, Leaflet☆14Sep 16, 2022Updated 3 years ago
- Windows Live API binding and connect support.☆18Dec 1, 2024Updated last year
- Tutorial on running keras model in C++ and python tensorflow☆11Oct 30, 2018Updated 7 years ago
- Scripts, data and researches related to cow weight and breed prediction☆13Aug 24, 2025Updated 6 months ago
- Line shuffler for huge text file which does not fit in memory☆13Dec 1, 2022Updated 3 years ago
- 🎨 Generator themes for Drawflow☆12Feb 7, 2022Updated 4 years ago
- ☆14Dec 24, 2016Updated 9 years ago
- Another static site generator☆11Apr 18, 2025Updated 10 months ago
- bigram / trigram analysis of wikipedia; mainly mutual info☆22Mar 6, 2012Updated 13 years ago
- Yagnus Javascript Libraries☆22Jul 22, 2013Updated 12 years ago
- A collection of neat tools related to the Xtend language.☆10Feb 16, 2015Updated 11 years ago
- ☆11Aug 31, 2022Updated 3 years ago