WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 4 years ago
Alternatives and similar repositories for pdfindexer
Users that are interested in pdfindexer are comparing it to the libraries listed below
Sorting:
- ☆38Updated 9 years ago
- An HTML to Asciidoc converter written in JavaScript☆23Updated 10 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Updated 8 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 3 months ago
- ☆25Updated 9 years ago
- Provenance: Linking and Understanding Sources☆17Updated 11 months ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- JDBC driver for data.world☆18Updated 7 months ago
- Markdown to Asciidoc Converter for Java☆13Updated 10 years ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 8 years ago
- Core API for Silverpeas☆49Updated this week
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 3 years ago
- ☆49Updated 8 years ago
- Text Mining Library with a focus on Latent Semantic Analysis☆12Updated 11 years ago
- ☆19Updated 7 years ago
- Library for building reproducible data pipelines to support experimentation☆20Updated 9 years ago
- ☆25Updated 8 years ago
- ☆20Updated 8 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated 2 months ago
- Mirror of Apache Marmotta☆54Updated 5 years ago
- Fusion demo app searching open-source project data from the Apache Software Foundation☆42Updated 6 years ago
- ☆13Updated last year
- A cookiecutter template for Apache Spark applications written in Scala☆10Updated 6 years ago
- Blazegraph Tinkerpop3 Implementation☆61Updated 4 years ago
- Apache NiFi Custom Processor Extracting Text From Files with Apache Tika☆35Updated last year
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆71Updated last year