WARC (Web Archive) Input and Output Formats for Hadoop
☆37Dec 7, 2014Updated 11 years ago
Alternatives and similar repositories for warc-hadoop
Users that are interested in warc-hadoop are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A flexible pure-Java OCR implementation. Eventually.☆20Jan 2, 2015Updated 11 years ago
- Java implementation of the Internet Research Lab Web Crawler (IRLbot) as presented by Hsin-Tsang Lee, Derek Leonard, Xiaoming Wang, and D…☆17May 25, 2017Updated 8 years ago
- Rainfall is an extensible java framework to implement custom DSL based stress and performance tests☆12Mar 31, 2026Updated last month
- This is a TREC evaluation demonstration written for a lecture on information retrieval evaluation.☆24Feb 12, 2018Updated 8 years ago
- ☆19Feb 7, 2016Updated 10 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- A set of reusable Java components that implement functionality common to any web crawler☆257Updated this week
- TREC Core track☆11Jul 5, 2017Updated 8 years ago
- bindings to some parts of opencv to lua+torch☆15Feb 14, 2013Updated 13 years ago
- ☆13Nov 30, 2015Updated 10 years ago
- pymur is a Python interface to The Lemur Toolkit.☆19Sep 17, 2018Updated 7 years ago
- A collection of demonstration languages in Lua/Terra suitable for learning or for forking when creating a new language☆11Aug 27, 2015Updated 10 years ago
- A reinforcement learning package implemented in Torch☆11Jan 24, 2016Updated 10 years ago
- Launch AWS Elastic MapReduce jobs that process Common Crawl data.☆49Feb 15, 2017Updated 9 years ago
- The shared memory version of the Alternating Directions Implicit Solver for Isogeometric Analysis☆10Jan 26, 2019Updated 7 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Java library for object oriented exception handling☆17Jun 7, 2018Updated 7 years ago
- Common web archive utility code.☆63May 2, 2026Updated 2 weeks ago
- Concurrent and distributed Prolog via join patterns (join calculus)☆12Mar 10, 2015Updated 11 years ago
- Lambda Function to extract EXIF data from images uploaded to an S3 bucket and store it in DynamoDB.☆15Aug 17, 2018Updated 7 years ago
- Evaluation Kit of Joint Recovery of Dense Correspondence and Cosegmentation in Two Images (CVPR 2016)☆12Apr 25, 2018Updated 8 years ago
- Application for ground-truthing semantic segmentation datasets in PyQt4/OpenCV.☆11Aug 15, 2017Updated 8 years ago
- Pure JAX-RS 2.0 ClientRequestFilter/WriterInterceptor used to sign AWS REST requests. Also has presign capabilities.☆15Jan 4, 2022Updated 4 years ago
- Hi Spring fans! Welcome to another super short mid-season interregnum installment of Spring Tips in which I introduce a *super* prelimina…☆12Mar 21, 2019Updated 7 years ago
- Arteria is a high performance message channel system for IPC and network communication☆12Jun 21, 2017Updated 8 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Example source for MongoDB / JavaScript snippets☆27Mar 11, 2013Updated 13 years ago
- MySQL UDF executing Lua code with storage engine API☆19May 18, 2017Updated 9 years ago
- The Architecture of Open Source Applications☆13Nov 24, 2013Updated 12 years ago
- ☆50Feb 22, 2017Updated 9 years ago
- Warcbase is an open-source platform for managing analyzing web archives☆162Dec 8, 2017Updated 8 years ago
- TREC Real-Time Summarization Tools☆15Jul 19, 2017Updated 8 years ago
- Deep Learning (PyTorch) Models Deployment using SQL databases☆10Jul 25, 2021Updated 4 years ago
- New nixnote is cloned on miurahr/nixnote2 ... Nixnote (formaly nevernote) is imcomplete evernote OSS cilent. here is a development branch…☆19Feb 16, 2013Updated 13 years ago
- S1P demo for the power of Reactor Netty and Reactor Kafka in order to build Reactive System☆13May 28, 2019Updated 6 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- This is the source code accompanying my blog post explaining the upside of using pure functions in Java.☆11Nov 5, 2020Updated 5 years ago
- Spring Data Aerospike☆36Jan 30, 2020Updated 6 years ago
- A web interface for humans to interact with Beads - the issue tracker made for agents https://github.com/steveyegge/beads☆26Oct 16, 2025Updated 7 months ago
- Command Line Tool to Help You Send Newsletters by Email☆14Updated this week
- ☆24Jul 13, 2022Updated 3 years ago
- A TensorFlow 2.0 .whl file compiled with an old processor/computer☆11Dec 12, 2020Updated 5 years ago
- A project to apply a traditional implementation of Slurm on Kubernetes (with some magic)☆11Dec 20, 2017Updated 8 years ago