bejean/crawl-anywhere

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/bejean/crawl-anywhere)

bejean / crawl-anywhere

Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.

☆99

Alternatives and similar repositories for crawl-anywhere

Users that are interested in crawl-anywhere are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

OlivierBlanvillain / crawler
View on GitHub
Blog crawler for the blogforever project.
☆23Jan 31, 2014Updated 12 years ago
weblyzard / ewrt
View on GitHub
extensible Web Retrieval Toolkit
☆17Jun 2, 2022Updated 4 years ago
sfcta / androidtracks
View on GitHub
Android Tracks
☆30Apr 28, 2022Updated 4 years ago
BayanGroup / nutch-custom-search
View on GitHub
☆67Dec 11, 2016Updated 9 years ago
commoncrawl / commoncrawl-examples
View on GitHub
A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)
☆66Aug 5, 2016Updated 9 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
svetlyak40wt / scrapy-useragents
View on GitHub
A middleware to use random user agent in Scrapy crawler.
☆33Dec 15, 2012Updated 13 years ago
omakei / hrms-filament
View on GitHub
Human resource managment system implemented with filament php.
☆14Dec 28, 2022Updated 3 years ago
danielstieger / moware35
View on GitHub
MoWare 2019.X - mrs branch
☆32Dec 2, 2025Updated 6 months ago
SagarPrasad / opennlp-examples
View on GitHub
opennlp-solr-examples
☆10Jul 1, 2022Updated 3 years ago
sloria / textfeel-web
View on GitHub
An online sentiment analyzer built with Flask and TextBlob
☆15Sep 3, 2013Updated 12 years ago
ATLANTBH / nutch-plugins
View on GitHub
Apache Nutch extensions
☆34Mar 21, 2022Updated 4 years ago
rayokota / generator-angular-nancy
View on GitHub
Yeoman generator for AngularJS + Nancy
☆14May 26, 2015Updated 11 years ago
mccraigmccraig / opennlp
View on GitHub
mirror of opennlp.sourceforge.net
☆12Dec 8, 2009Updated 16 years ago
2gis / kafka-connect-hdfs-ext
View on GitHub
Set of extensions for kafka connect hdfs
☆11May 12, 2021Updated 5 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
Westly / CommanderPrecons
View on GitHub
CSV and JSON files of all official Magic the Gathering pre-constructed decks (sourced from Moxfield)
☆15Apr 4, 2026Updated 2 months ago
bbc / bbcrd-synth-study
View on GitHub
Scripts and Instructions for training and synthesising artificial voices
☆12Mar 27, 2024Updated 2 years ago
ilikerobots / django-vue-utilities
View on GitHub
☆14Oct 3, 2023Updated 2 years ago
elelias / ebay
View on GitHub
machine-learning techniques on ebay data
☆15Oct 31, 2013Updated 12 years ago
RedisLabs / spark-timeseries
View on GitHub
A library for financial and time series calculations on Apache Spark
☆28Feb 2, 2016Updated 10 years ago
shanbady / NLTK-Boston-Python-Meetup
View on GitHub
December 14th Python Meetup Files
☆40Mar 2, 2013Updated 13 years ago
liberit / scraptils
View on GitHub
scraper related helper functions
☆28Jun 28, 2014Updated 12 years ago
fanshi118 / NLP_NMT_Project
View on GitHub
Neural Machine Translation project for NLP Fall 2016
☆10Dec 20, 2016Updated 9 years ago
puntofisso / SpikesMap
View on GitHub
Replicating in Python the electoral maps made by the Berliner Morgenpost
☆15Dec 24, 2019Updated 6 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
feiskyer / scrapy-examples
View on GitHub
Some scrapy and web.py exmaples
☆79May 20, 2017Updated 9 years ago
fmorbini / jmNL
View on GitHub
modular NL platform for dialogue agents
☆17Oct 26, 2017Updated 8 years ago
City-of-Turku / kada
View on GitHub
KADA – Kuntien avoin digialusta
☆12Oct 5, 2022Updated 3 years ago
lucidworks / storm-solr
View on GitHub
Storm / Solr Integration
☆19Feb 2, 2024Updated 2 years ago
speedment / speedment-code-samples
View on GitHub
Code samples for the Speedment ORM
☆13Jun 21, 2022Updated 4 years ago
jeresig / deepleap
View on GitHub
☆34Jan 13, 2022Updated 4 years ago
camme / examples
View on GitHub
Examples
☆12Feb 18, 2014Updated 12 years ago
zhangw / phantomjs_search_weibo
View on GitHub
search topics of sina weibo by phantomjs
☆12Dec 20, 2015Updated 10 years ago
julien-duponchelle / scrapy-graphite
View on GitHub
Output scrapy statistics to graphite/carbon
☆54Mar 9, 2013Updated 13 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
shutterstock / shutterstock-heatmap-toolkit
View on GitHub
Shutterstock's interactive heatmap toolkit powered by heatmap.js and Solr
☆37Jul 7, 2022Updated 3 years ago
javasoze / meaningfulweb
View on GitHub
Web page content extractor
☆32Feb 26, 2013Updated 13 years ago
OpenNMT / Presentations
View on GitHub
Presentations documents related to OpenNMT talk or events
☆14Mar 13, 2018Updated 8 years ago
wikimedia / language-data
View on GitHub
Language data and utilities
☆18Jun 18, 2026Updated last week
hplt-project / OpusTrainer
View on GitHub
Curriculum training
☆22Jun 25, 2025Updated last year
rakali / pandoc-schemata
View on GitHub
JSON Schema files for Pandoc JSON
☆14Aug 19, 2014Updated 11 years ago
gansidui / bktree
View on GitHub
bk-tree for golang
☆11Jul 30, 2022Updated 3 years ago