WolfgangFahl / pdfindexerLinks
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 2 months ago
Alternatives and similar repositories for pdfindexer
Users that are interested in pdfindexer are comparing it to the libraries listed below
Sorting:
- An HTML to Asciidoc converter written in JavaScript☆23Updated 10 years ago
- ☆39Updated 10 years ago
- A course on free/libre and open source software☆11Updated 2 months ago
- Fusion demo app searching open-source project data from the Apache Software Foundation☆43Updated 7 years ago
- Fast in-memory graph structure, powering Gephi☆78Updated this week
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java.☆134Updated 2 years ago
- Quick demos using the Toolkit☆96Updated 2 years ago
- Gephi Toolkit - All Gephi in a Library☆182Updated last year
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆47Updated 4 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆55Updated 4 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 8 years ago
- Preliminary Solr DQ / Data Quality experiments and prototype, and SolrJ wrapper utilities☆26Updated 11 months ago
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Updated last month
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆276Updated 3 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Updated 10 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆72Updated last year
- Suite of tools for detecting changes in web pages and their rendering☆55Updated 2 years ago
- Detect memory leaks in minutes without a heap dump.☆17Updated 8 years ago
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Updated 9 years ago
- A tool to generate UML class diagrams from JSON schema documents☆39Updated 5 years ago
- Solr Dictionary Annotator (Microservice for Spark)☆71Updated 5 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆28Updated last week
- A library for extracting tables from PDF files☆89Updated 12 years ago
- Blazegraph Tinkerpop3 Implementation☆62Updated 5 years ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 9 years ago
- Norconex Crawlers (or spiders) are flexible web and filesystem crawlers for collecting, parsing, and manipulating data from the web or fi…☆197Updated last week
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆89Updated 5 years ago
- Demonstration of searching PDF document with Solr, Tika, and Tesseract☆32Updated last year
- Quickly turn command-line applications into RESTful webservices with a web-application front-end. You provide a specification of your com…☆134Updated 2 months ago