API definition, resources and reference implementation of URL Frontiers
☆58Jan 23, 2026Updated last month
Alternatives and similar repositories for url-frontier
Users that are interested in url-frontier are comparing it to the libraries listed below
Sorting:
- Resources for running StormCrawler with Docker services☆10Nov 10, 2024Updated last year
- A scalable, mature and versatile web crawler based on Apache Storm☆972Updated this week
- Internet Archive's Sparkling Data Processing Library☆16Mar 3, 2026Updated 2 weeks ago
- An evil web server.☆13May 9, 2015Updated 10 years ago
- Easily crawl news portals or blog sites using Storm Crawler.☆21Nov 15, 2022Updated 3 years ago
- Add editing UI and other power-user features to Datasette.☆14Mar 4, 2023Updated 3 years ago
- Download GitHub repositories☆12May 10, 2025Updated 10 months ago
- A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and …☆48Sep 24, 2021Updated 4 years ago
- Original GOKb repo - Moving to https://github.com/openlibraryenvironment/gokb☆11Jan 23, 2018Updated 8 years ago
- demos using the OpenRNDR framework☆13Mar 27, 2020Updated 5 years ago
- Storm / Solr Integration☆19Feb 2, 2024Updated 2 years ago
- visualizations/charts for media collections, based on mediainfo☆14Sep 15, 2022Updated 3 years ago
- HTML parser and tag balancer.☆19Mar 12, 2026Updated last week
- The paper repository for "10 quick tips for making your software outlive your job"☆19Oct 28, 2025Updated 4 months ago
- MftReader is a Command-Line interface (CLI) program which reads the Master File Table (MFT) from NTFS volume. (C# Implementation with PIn…☆12Sep 13, 2018Updated 7 years ago
- My attempt to learn more than one Deep Learning framework☆15Apr 7, 2019Updated 6 years ago
- The BES framework, which forms the basis for the Hyrax server☆16Mar 13, 2026Updated last week
- Snowball Stemmer for Clojure☆18Jun 7, 2022Updated 3 years ago
- Docker container for ocropus3 OCR system☆12Aug 19, 2018Updated 7 years ago
- The overarching project of Java code related to the Open Provenance specifications.☆25Apr 18, 2011Updated 14 years ago
- Useful tools to extract malayalam text from the Common Crawl Datasets☆28Dec 11, 2024Updated last year
- A dataset downloaded from the deep and scientific web across three major Polar data centers for use in research.☆13Sep 8, 2017Updated 8 years ago
- Create CovJSON files from common scientific data formats☆14Apr 24, 2018Updated 7 years ago
- ifcParserLib is a set of reusable Java components that implement functionality for IFC file parsing.☆10Oct 14, 2020Updated 5 years ago
- Takes query parameters from a url to create the first cell of a jupyter notebook.☆17Nov 13, 2024Updated last year
- Specification for a query language to request Verifiable Presentations from wallets etc.☆10Jan 13, 2026Updated 2 months ago
- Mirror of Apache Edgent (Incubating) Samples☆15Feb 14, 2018Updated 8 years ago
- RESTful wrapper for the Joshua machine translation decoder☆14Oct 25, 2016Updated 9 years ago
- Ready-to-use examples of dkpro-core components and pipelines.☆35Dec 16, 2023Updated 2 years ago
- Open-Source Information Retrieval Reproducibility Challenge☆51Jan 11, 2016Updated 10 years ago
- An OpenStreetMap Visualization Toolkit for Python☆30Dec 18, 2017Updated 8 years ago
- A set of machine learning experiments in Clojure☆30Nov 30, 2012Updated 13 years ago
- SCAlable Preservation Environments☆40Jul 7, 2022Updated 3 years ago
- Easily add authentication to your postgrest API☆25Feb 14, 2019Updated 7 years ago
- IPLD Schema Implementation: parser and utilities☆16Mar 6, 2026Updated 2 weeks ago
- Experimental collections library☆14Mar 27, 2019Updated 6 years ago
- A platform for collecting, analyzing, and visualizing social media data.☆13Dec 27, 2020Updated 5 years ago
- Mirror of Apache Pony Mail (Incubating) Site☆13Jul 19, 2024Updated last year
- 微信公众号留言功能☆11Oct 29, 2019Updated 6 years ago