For interacting with nutch via Python
☆29Feb 18, 2026Updated last week
Alternatives and similar repositories for nutchpy
Users that are interested in nutchpy are comparing it to the libraries listed below
Sorting:
- Nutch-Python is a Python binding to the Apache Nutch™ REST services allowing Nutch to be called natively in the Python community. — Edit☆39Apr 15, 2016Updated 9 years ago
- Stream Processing ToolKit☆18Aug 14, 2015Updated 10 years ago
- tool for validating conda recipes and conda packages☆13Aug 15, 2024Updated last year
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆38Apr 9, 2024Updated last year
- open source, distributed, restful crawler engine☆14Feb 3, 2015Updated 11 years ago
- a framework and language for exploring and analyzing feeds of social media data.☆23Jan 25, 2012Updated 14 years ago
- Seed acquisition tool to bootstrap focused crawlers☆23Apr 24, 2017Updated 8 years ago
- The HDF5 Cloud Optimized Read Only Python Package☆28Dec 11, 2025Updated 2 months ago
- Gaia is a geospatial analysis library jointly developed by Kitware and Epidemico.☆33Apr 8, 2019Updated 6 years ago
- ☆32Jul 6, 2015Updated 10 years ago
- Messing about with a Cricut machine. This stuff is probably proprietary code, so I'm likely to get sued by doing this.☆14Dec 17, 2013Updated 12 years ago
- The goal of this experiment is to take articles and certain metadata and group them by topic.☆11Apr 14, 2016Updated 9 years ago
- This is a repository where all GitHub For Dummies readers can add a link to their GitHub profile!☆16Dec 12, 2025Updated 2 months ago
- Cloud Mining automatically builds exploratory faceted search systems.☆52Oct 15, 2013Updated 12 years ago
- Python wrappers for the FirecREST API☆12Dec 23, 2025Updated 2 months ago
- This is my main Java library for all kinds of datastructures, algorithms and everything else that I need.☆73Jun 14, 2023Updated 2 years ago
- Extending the HDF5 library to support intelligent I/O buffering for deep memory and storage hierarchy systems☆34Feb 17, 2025Updated last year
- An open-source news aggregator☆15Sep 9, 2016Updated 9 years ago
- ☆12Oct 25, 2015Updated 10 years ago
- Simple MapReduce implementation in Python, for text file parallel processing☆20Mar 3, 2012Updated 13 years ago
- A simple maintenance tracking tool for your vehicles.☆12Nov 1, 2025Updated 4 months ago
- Digitization information system build on top of Fedora repository☆16Jan 15, 2019Updated 7 years ago
- Green SqlAlchemy extensions for pulsar☆11Nov 24, 2017Updated 8 years ago
- Narwhal is a keyword and KEY NARRATIVE manager that creates language-aware classes. Because Narhwal does not use NLP it avoids complexity…☆12Oct 16, 2018Updated 7 years ago
- Stac-fastapi implementation with DuckDB backend.☆15Sep 14, 2025Updated 5 months ago
- extended benchmarking automation tool for HPC applications☆16Updated this week
- Focused Crawler for VT's CTRNet☆10May 13, 2013Updated 12 years ago
- Cloyster HPC is a turnkey HPC cluster solution with an user-friendly installer☆10Oct 2, 2025Updated 5 months ago
- Minimal web-based client for NewsBlur.☆20Dec 7, 2014Updated 11 years ago
- Lustre HSM tools☆10Feb 19, 2024Updated 2 years ago
- ☆13Sep 13, 2015Updated 10 years ago
- WarcMiddleware lets users seamlessly download a mirror copy of a website when running a web crawl with the Python web crawler Scrapy.☆48Mar 19, 2018Updated 7 years ago
- An open source information retrieval system written in C++11 and Python. Aspires to be an alternative to Nutch / Lucene. It uses MongoDB …☆87Jun 22, 2023Updated 2 years ago
- An implementation of Mikolov's word2vec in Python using Theano and Lasagne.☆37Jul 17, 2017Updated 8 years ago
- PECL MySQL X DevAPI☆18Jul 27, 2024Updated last year
- ☆16Jul 23, 2024Updated last year
- Traffic Counts Database☆11Apr 28, 2022Updated 3 years ago
- sparql-stream sensor queries☆16Sep 28, 2016Updated 9 years ago