The Common Crawl Crawler Engine and Related MapReduce code (2008-2012)
☆224Dec 22, 2022Updated 3 years ago
Alternatives and similar repositories for commoncrawl-crawler
Users that are interested in commoncrawl-crawler are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- FoGFaaS: Add serverless computing (faas) to ifogsim☆22Mar 30, 2025Updated last year
- Common Crawl fork of Apache Nutch☆41Jun 3, 2026Updated last week
- Bixo is an open source web mining toolkit that runs as a series of Cascading pipes on top of Hadoop. By building a customized Cascading p…☆143Jul 7, 2022Updated 3 years ago
- Simple Samza Job Using Confluent Platform☆14Apr 14, 2016Updated 10 years ago
- Run cassandra inside a java project without bring server deps into client classpath☆32Feb 26, 2019Updated 7 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- SKOS Support for Apache Lucene and Solr☆55May 12, 2021Updated 5 years ago
- Examples and Slides for "Introduction to Spring for Apache Hadoop" at SpringOne2GX 2014☆16Jan 7, 2019Updated 7 years ago
- Rails helpers for outputting preloading/prefetching metadata.☆19Jul 8, 2018Updated 7 years ago
- java分布式爬虫,主机和从机控制的机制☆14May 21, 2015Updated 11 years ago
- Distributed Realtime Search with Lucene and MongoDB☆60May 14, 2018Updated 8 years ago
- A fork of cascading patterns, but implemented for trident☆72Dec 16, 2023Updated 2 years ago
- Camus Compressor merges files created by Camus and saves them in a compressed format.☆13Mar 20, 2023Updated 3 years ago
- Example Node.js application demonstrating Cucumber.js usages☆42Jun 3, 2013Updated 13 years ago
- Source for Reactive Architecture: Beyond the Basics online training course☆16Jul 26, 2017Updated 8 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Pépin is a web image & video player with features like zoom, pan, comparisons, fullscreen, gapless videos playback, frame-by-frame scrubb…☆12Feb 3, 2017Updated 9 years ago
- Automatic, zero-config web scraping -- written in Java, has no dependency on Java EE or app servers, and the web scraper has a restful/JS…☆157Jun 25, 2017Updated 8 years ago
- A library for financial and time series calculations on Apache Spark☆28Feb 2, 2016Updated 10 years ago
- A serial text terminal written in Verilog for Tang SiPeed Primer FPGA☆15Dec 13, 2021Updated 4 years ago
- Tool for visualizing Apache Oozie pipelines☆13Feb 15, 2016Updated 10 years ago
- Repository containing scripts for importing OpenAlex snapshots into BigQuery☆15Mar 6, 2026Updated 3 months ago
- MSAM是一个API接口文档管理器,用于生成兼容Swagger.json的接口文件的接口管理软件本项目已经停止运维,请使用升级版☆21May 23, 2023Updated 3 years ago
- A unitypackaged mirror of Moq, for use in Unity3D☆17Aug 28, 2013Updated 12 years ago
- 华南理工大学高英实验室进行的分布式爬虫项目,除了实验室内部人员外,不得私自传播.☆21Jul 13, 2014Updated 11 years ago
- Deploy open-source AI quickly and easily - Special Bonus Offer • AdRunpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
- DC/OS community content☆11May 16, 2018Updated 8 years ago
- Java audio routing and unit generator library.☆13Oct 13, 2020Updated 5 years ago
- The Scholix metadata schema is a set of properties describing a Link Information Package, which carries information about a link between …☆17Mar 14, 2022Updated 4 years ago
- The hub for all JATS4R meeting notes, examples, draft recommendations, documents, and issues.☆17Sep 8, 2019Updated 6 years ago
- Learning Spring 5.0, published by Packt☆10Oct 31, 2022Updated 3 years ago
- Sends repository usage statistics events to Matomo☆14May 11, 2026Updated last month
- Notes and cheat sheets on various topics☆25Dec 22, 2022Updated 3 years ago
- Start of an Internet draft on the separation between HTTP's semantic layer, framing layer(s), and the underlying transport layer.☆15Mar 22, 2016Updated 10 years ago
- Easily Deploy Code to AWS Lambda☆13Aug 15, 2018Updated 7 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A example implementation react native vision camera.☆15Mar 7, 2022Updated 4 years ago
- A distributed generic query layer for Apache Kafka Interactive Queries☆26Nov 8, 2017Updated 8 years ago
- Home of RDF2Go and RDFReactor☆13Jun 9, 2016Updated 10 years ago
- ☆13May 11, 2022Updated 4 years ago
- Domain name classifier looking for good vs. possibly malicious providers☆34May 4, 2018Updated 8 years ago
- This is a clone of an SVN repository at http://svn.terracotta.org/svn/ehcache. It had been cloned by http://svn2github.com/ , but the ser…☆13Jan 21, 2015Updated 11 years ago
- Elliptic Curve Cryptography☆13Mar 23, 2010Updated 16 years ago