crawler-commons / url-frontier
API definition, resources and reference implementation of URL Frontiers
☆47Updated this week
Alternatives and similar repositories for url-frontier:
Users that are interested in url-frontier are comparing it to the libraries listed below
- Common web archive utility code.☆52Updated last month
- Java library for reading and writing WARC files with a typed API☆49Updated last month
- Advanced desktop search/corpus exploration prototype☆21Updated 3 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- ☆48Updated 7 years ago
- Query preprocessor for Java-based search engines (Querqy Core and Solr implementation)☆183Updated last week
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆30Updated 3 months ago
- WARC and ARC indexing and discovery tools.☆118Updated 5 months ago
- KnowledgeStore☆20Updated 6 years ago
- Search Management UI☆53Updated last month
- A Java library for working with Frictionless Data Data Packages.☆20Updated last year
- Java port of SymSpell: 1 million times faster through Symmetric Delete spelling correction algorithm☆66Updated 4 years ago
- A search interface and wayback machine for the UKWA Solr based warc-indexer framework.☆105Updated this week
- A fast and simple JavaScript library specifically targeted at collecting search and search related browser events.☆41Updated 5 months ago
- TheMovieDB in Solr☆21Updated 6 months ago
- Document Ingestion Framework for Search Systems☆34Updated 2 weeks ago
- Highly performant, lightweight framework for linked data processing. Supports RDFa, JSON-LD, RDF/XML and plain text formats, runs on Andr…☆51Updated 2 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆25Updated 6 months ago
- Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archive…☆24Updated 2 years ago
- Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.☆43Updated 7 years ago
- Efficient indexing and retrieval of OCR bounding boxes in Solr☆22Updated 5 years ago
- A small tool which uses the CommonCrawl URL Index to download documents with certain file types or mime-types. This is used for mass-test…☆64Updated 3 weeks ago
- Easily crawl news portals or blog sites using Storm Crawler.☆20Updated 2 years ago
- Towards an open source stack for e-commerce search☆145Updated last month
- Zulia Search Engine☆32Updated last week
- Vector Plugin for Solr: calculate dot product / cosine similarity on documents☆14Updated 6 years ago
- Github mirror of "search/extra" - our actual code is hosted with Gerrit (please see https://www.mediawiki.org/wiki/Developer_access for c…☆53Updated 2 months ago
- functionality on top of an RDF store while accounting for and exploiting the fundamental differences between graph storage and relation…☆12Updated 10 months ago
- DKPro JWPL (DKPro Java Wikipedia Library) is a free, Java-based application programming interface that facilitates access to all informat…☆82Updated 2 months ago
- Search relevance evaluation toolkit☆31Updated 2 years ago