WolfgangFahl / pdfindexerLinks
Index and search PDF files using Apache Lucene and PDF Box
☆44Updated 4 years ago
Alternatives and similar repositories for pdfindexer
Users that are interested in pdfindexer are comparing it to the libraries listed below
Sorting:
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 4 months ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Updated 9 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆31Updated 8 months ago
- Core API for Silverpeas☆50Updated last week
- Fusion demo app searching open-source project data from the Apache Software Foundation☆42Updated 6 years ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆20Updated 4 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- ☆25Updated 9 years ago
- Geographic Place, Date/time, and Pattern entity extraction toolkit along with text extraction from unstructured data and GIS outputters.☆44Updated last month
- Fast in-memory graph structure, powering Gephi☆75Updated last month
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆25Updated 6 years ago
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆22Updated 8 months ago
- Cloudfier is a model-driven tool for rapid development of business applications☆22Updated 2 weeks ago
- RDF store on a cloud-based architecture (previously on https://code.google.com/p/cumulusrdf)☆31Updated 9 years ago
- Provenance: Linking and Understanding Sources☆17Updated last year
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 8 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆72Updated last year
- Files for the Karma tutorial at TCDL, Texas Conference on Digital Libraries☆29Updated 9 years ago
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆37Updated last year
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆28Updated 6 years ago
- The OpenSextant Gazetteer is a collection of world-wide place name data☆12Updated 7 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated 3 months ago
- A repo that contains outgoing links from DBpedia☆50Updated 5 years ago
- General Architecture for Text Engineering☆49Updated 9 years ago
- A curated list of Awesome Apache Solr links and resources.☆109Updated 3 years ago
- sparql-stream sensor queries☆16Updated 8 years ago
- A web tool enabling authorship and download of RDF, and RDF visualization in Linked Open Data☆37Updated 5 years ago