Crawl-Anywhere - Web Crawler and document processing pipeline with Solr integration.
☆98Jul 1, 2017Updated 8 years ago
Alternatives and similar repositories for crawl-anywhere
Users that are interested in crawl-anywhere are comparing it to the libraries listed below
Sorting:
- Blog crawler for the blogforever project.☆23Jan 31, 2014Updated 12 years ago
- Fureteur is a simple, configurable, fault-tolerant web crawler written is Scala☆28Oct 14, 2014Updated 11 years ago
- An online sentiment analyzer built with Flask and TextBlob☆15Sep 3, 2013Updated 12 years ago
- A simple library for loading word2vec binary model.☆12Sep 17, 2015Updated 10 years ago
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Jan 21, 2026Updated last month
- Android Tracks☆30Apr 28, 2022Updated 3 years ago
- ☆66Dec 11, 2016Updated 9 years ago
- extensible Web Retrieval Toolkit☆17Jun 2, 2022Updated 3 years ago
- Evolving expressions using genetic algorithms☆17Jun 20, 2020Updated 5 years ago
- modular NL platform for dialogue agents☆17Oct 26, 2017Updated 8 years ago
- Apache Nutch extensions☆34Mar 21, 2022Updated 3 years ago
- Storm / Solr Integration☆19Feb 2, 2024Updated 2 years ago
- scraper related helper functions☆27Jun 28, 2014Updated 11 years ago
- fetchIO is a simple, configurable, fault-tolerant web crawler written in Haskell☆23Feb 16, 2017Updated 9 years ago
- Shutterstock's interactive heatmap toolkit powered by heatmap.js and Solr☆37Jul 7, 2022Updated 3 years ago
- 深度学习是利用卷积网络的深层结构提取的信息,卷积网络目前主要用于图像识别分类技术,其实在其中间层中包含了丰富的有用信息,而这些正是风格迁移的基础。 如果研究 CNN 的各层级结构,会发现里面的每一层神经元的激活态都对应了一种特定的信息,越是底层的就越接近画面的纹理信息,如…☆10Aug 25, 2021Updated 4 years ago
- A library for financial and time series calculations on Apache Spark☆28Feb 2, 2016Updated 10 years ago
- Cloud Mining automatically builds exploratory faceted search systems.☆52Oct 15, 2013Updated 12 years ago
- ☆16Dec 23, 2024Updated last year
- A Sublime Text plugin to move through and reform things☆179Sep 28, 2023Updated 2 years ago
- The goal of this experiment is to take articles and certain metadata and group them by topic.☆11Apr 14, 2016Updated 9 years ago
- Java implementation of the EbMS 2.0 specification.☆10Feb 20, 2026Updated 2 weeks ago
- A Data Mesh demo repository☆13Oct 10, 2024Updated last year
- This project introduces user interface for two calendars, Gregorian and Hijri integrated with Ummalqura , User can select a date from any…☆10Aug 20, 2017Updated 8 years ago
- Structured Data Extractor. An application to extract structured data from web pages. It uses Data Extraction Based on Partial Tree Alignm…☆49Jun 9, 2012Updated 13 years ago
- Sort-friendly URI Reordering Transform (SURT) python module☆45Sep 11, 2025Updated 5 months ago
- Modularly extensible semantic metadata validator☆85Dec 10, 2015Updated 10 years ago
- An open-source news aggregator☆15Sep 9, 2016Updated 9 years ago
- generate custom supreme box logos☆13Nov 28, 2017Updated 8 years ago
- Human resource managment system implemented with filament php.☆13Dec 28, 2022Updated 3 years ago
- Green SqlAlchemy extensions for pulsar☆11Nov 24, 2017Updated 8 years ago
- Searchable dropdown component Laravel Package☆10Feb 16, 2025Updated last year
- PacketZoom SDK for React Native☆11Sep 21, 2018Updated 7 years ago
- Performs multi document summarization. Includes a method to generate summaries: The method uses a sentence importance score calculator ba…☆38Apr 7, 2013Updated 12 years ago
- Entity Linking for the masses☆56Nov 10, 2015Updated 10 years ago
- github军火库☆21May 24, 2017Updated 8 years ago
- Bicycle Incident reporting☆13Jul 22, 2022Updated 3 years ago
- An attempt at creating a gold standard dataset for backtesting yesterday & today's content-extractors☆35Mar 19, 2015Updated 10 years ago
- Flask app for monitoring OEE☆11Sep 25, 2023Updated 2 years ago