Launch AWS Elastic MapReduce jobs that process Common Crawl data.
☆49Feb 15, 2017Updated 9 years ago
Alternatives and similar repositories for elasticrawl
Users that are interested in elasticrawl are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Hadoop jobs for WikiReverse project. Parses Common Crawl data for links to Wikipedia articles.☆38Aug 12, 2018Updated 7 years ago
- A simple Ruby example of how to process Common Crawl files using Elastic MapReduce☆29Mar 25, 2012Updated 14 years ago
- Python script to create CDX index files of WARC data☆16Sep 7, 2018Updated 7 years ago
- Docker containers for running VIVO☆13Oct 26, 2016Updated 9 years ago
- Fcrepo4 webapp plus optional fcrepo dependencies☆13Sep 30, 2020Updated 5 years ago
- Proton VPN Special Offer - Get 70% off • AdSpecial partner offer. Trusted by over 100 million users worldwide. Tested, Approved and Recommended by Experts.
- Rails engine for working with storage of OpenAnnotations stored in Fedora4☆13Aug 4, 2016Updated 9 years ago
- ☆17Apr 19, 2025Updated 11 months ago
- Apache Nutch fork tunned for web services and data discovery.☆10May 18, 2015Updated 10 years ago
- Django app for managing PREMIS Events☆14Mar 9, 2026Updated 3 weeks ago
- Templates for form letters to Canadian MPs☆20Jan 30, 2017Updated 9 years ago
- Ansible deployment of fedora 4, single or clustered on ubuntu 14.04☆10Nov 11, 2015Updated 10 years ago
- A collection of ready-to-use messaging applications with fcrepo-camel☆12Dec 12, 2025Updated 3 months ago
- ARK minter, binder, resolver☆23Feb 25, 2026Updated last month
- ☆14Sep 13, 2014Updated 11 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting with the flexibility to host WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Cloudways by DigitalOcean.
- A generic data anomaly finder. You can use a beautiful web page, drag-and-drop your csv dataset and easily find the top N anomalies in th…☆33Oct 13, 2022Updated 3 years ago
- Events and Situations Ontology☆14Apr 20, 2018Updated 7 years ago
- The JSON files from CourtListener.com for the Supreme Court of the United States☆11Jul 9, 2015Updated 10 years ago
- A crawler based on Phantom. Allows discovery of dynamic content and supports custom scrapers.☆24May 31, 2017Updated 8 years ago
- Ansible Roles and Playbooks for Princeton University Library☆19Mar 22, 2026Updated last week
- Fedora API Specification☆17May 6, 2021Updated 4 years ago
- Several scripts to analyse Wikidata dumps☆33Apr 7, 2014Updated 11 years ago
- This is a gem that provides the ability to create a workspace, import scan data from nexpose, and perform a webscan, a web audit, and per…☆10Dec 13, 2017Updated 8 years ago
- Repository for the markdownlint-mdl-action Github Action☆25Dec 26, 2025Updated 3 months ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- A simple task orchestration library for running complex processes or workflows in Ruby☆28Oct 4, 2024Updated last year
- Set of scripts to aid in the download of the GDELT data files from gdelt.utdallas.edu☆18May 14, 2014Updated 11 years ago
- Phoenix script for mac Virtual Spaces☆12Apr 17, 2025Updated 11 months ago
- ☆10May 18, 2017Updated 8 years ago
- Twitter command line client example (ne scala 2015)☆15Apr 7, 2015Updated 10 years ago
- An attempt to re-create a KISS TNC algorithm in LoRa using Arduino☆14Apr 26, 2018Updated 7 years ago
- My NaNoGenMo 2014 project: a generative detective comic☆16Nov 22, 2014Updated 11 years ago
- Co-reference resolution for the English language.☆17Jan 12, 2015Updated 11 years ago
- A simple scheduler that outputs a schedule given a todo list.☆24Nov 22, 2014Updated 11 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- utility to fetch provenance information from Internet Archive's Wayback Machine☆14Feb 5, 2026Updated last month
- Linking Entities in CommonCrawl Dataset onto Wikipedia Concepts☆59Sep 5, 2012Updated 13 years ago
- JAWS is "Just A Web Shell" framework for delivering Force.com web applications to iOS (iPhone/iPad) devices.☆14Mar 27, 2011Updated 15 years ago
- An simple authentication library for ember-cli-cordova applications☆20May 13, 2015Updated 10 years ago
- A docker image for Omeka S - does not include either modules or themes, just Omeka itself☆17Mar 15, 2026Updated 2 weeks ago
- A Rails adapter for test-unit☆11Nov 22, 2025Updated 4 months ago
- Cloud Computing library for erlang -- Official repository is now https://github.com/erlcloud/erlcloud☆17Jan 24, 2017Updated 9 years ago