WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 4 years ago
Alternatives and similar repositories for pdfindexer:
Users that are interested in pdfindexer are comparing it to the libraries listed below
- ☆37Updated 9 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 2 months ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆20Updated 3 years ago
- Quick demos using the Toolkit☆93Updated 2 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Updated 2 years ago
- Provenance: Linking and Understanding Sources☆17Updated 10 months ago
- ☆49Updated 8 years ago
- Cytoscape 3 desktop version.☆17Updated 4 months ago
- Fusion demo app searching open-source project data from the Apache Software Foundation☆42Updated 6 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- JDBC driver for data.world☆18Updated 5 months ago
- Python bindings for Neo4j☆26Updated 10 years ago
- An HTML to Asciidoc converter written in JavaScript☆23Updated 10 years ago
- ☆19Updated 7 years ago
- Fast in-memory graph structure, powering Gephi☆74Updated 4 months ago
- Apache OpenNLP Sandbox☆42Updated this week
- A Query Autofiltering SearchComponent for Solr that can translate free-text queries into structured queries using index metadata☆28Updated 6 years ago
- D3 and Play based visualization for entity-relation graphs, especially for NLP and information extraction☆29Updated 9 years ago
- Parse GraphML file in Python.☆59Updated last year
- AsciiDoc Builder and Writer for Sphinx☆20Updated 6 years ago
- Blazegraph Tinkerpop3 Implementation☆61Updated 4 years ago
- This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading …☆17Updated last year
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- Home of RDF2Go and RDFReactor☆13Updated 8 years ago
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 8 years ago
- For interacting with nutch via Python☆24Updated last month
- Open Source, Distributed, Big Data Enterprise Search Engine☆69Updated 2 weeks ago