WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 4 years ago
Alternatives and similar repositories for pdfindexer:
Users that are interested in pdfindexer are comparing it to the libraries listed below
- ☆36Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆15Updated 9 years ago
- An HTML to Asciidoc converter written in JavaScript☆23Updated 9 years ago
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆22Updated 3 months ago
- ☆25Updated 8 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 3 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Quick demos using the Toolkit☆94Updated last year
- ☆48Updated 7 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- Code and templates required to build the DARPA open catalog.☆17Updated 8 years ago
- Provenance: Linking and Understanding Sources☆17Updated 7 months ago
- A Java library for working with Frictionless Data Data Packages.☆20Updated last year
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 7 years ago
- Core API for Silverpeas☆49Updated this week
- Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.☆36Updated 9 months ago
- Python bindings for Neo4j☆26Updated 10 years ago
- Information extraction and interactive visualization of textual datasets for investigative data-driven journalism and eDiscovery☆53Updated 6 months ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 2 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆23Updated 10 years ago
- Keyword Extraction system using Brown Clustering - (This version is trained to extract keywords from job listings)☆18Updated 10 years ago
- Work in progress: a new visualization engine☆34Updated 7 months ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago
- Fast in-memory graph structure, powering Gephi☆73Updated 2 months ago
- Advanced desktop search/corpus exploration prototype☆21Updated 3 years ago
- ☆13Updated 10 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 2 years ago
- Talend Component Kit (implementation repository)☆31Updated this week
- Chorus, now for Elasticsearch!☆15Updated 6 months ago