WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 4 years ago
Related projects ⓘ
Alternatives and complementary repositories for pdfindexer
- Python bindings for Neo4j☆26Updated 10 years ago
- ☆36Updated 9 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 2 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆14Updated 9 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆64Updated 8 years ago
- DBpedia Distributed Extraction Framework: Extract structured data from Wikipedia in a parallel, distributed manner☆41Updated 2 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 2 years ago
- Suite of tools for detecting changes in web pages and their rendering☆53Updated 11 months ago
- Java library to interface with OpenML☆10Updated last month
- Online service for analyzing research profiles of scientists and conferences☆12Updated 2 years ago
- Work in progress: a new visualization engine☆34Updated 5 months ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆20Updated 3 years ago
- stav text annotation visualiser☆34Updated 13 years ago
- Parse wikipedia dumps and index (some) page data to elasticsearch☆49Updated 9 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 3 years ago
- Common web archive utility code.☆50Updated last month
- A workflow system for Natural Language Processing.☆21Updated 5 years ago
- For interacting with nutch via Python☆23Updated 3 weeks ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆78Updated 4 years ago
- ☆25Updated 8 years ago
- Apache UIMA Java SDK☆64Updated this week
- A toolkit for clustering web pages based on various similarity measures.☆32Updated 3 years ago
- Visualization of result returning by Solr 6 graph query☆10Updated 8 years ago
- Stanford CoreNLP NER addon for Apache Tika's NamerEntityParser☆13Updated 2 years ago
- Python wrapper for Apache Tika, made to be easy_installed☆25Updated 12 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆25Updated 5 months ago