WolfgangFahl / pdfindexerLinks
Index and search PDF files using Apache Lucene and PDF Box
☆44Updated last month
Alternatives and similar repositories for pdfindexer
Users that are interested in pdfindexer are comparing it to the libraries listed below
Sorting:
- ☆38Updated 10 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆47Updated 3 years ago
- Cytoscape 3 desktop version.☆17Updated last month
- An HTML to Asciidoc converter written in JavaScript☆23Updated 10 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆17Updated 10 years ago
- A course on free/libre and open source software☆11Updated last month
- Quick demos using the Toolkit☆96Updated 2 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 8 years ago
- Wandora is a general purpose information extraction, management and publishing application based on Topic Maps and Java.☆133Updated 2 years ago
- A set of widgets for Python's Orange Machine Learning to work with Apache Spark ML☆15Updated 8 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆275Updated 3 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 4 years ago
- Files for the Karma tutorial at TCDL, Texas Conference on Digital Libraries☆29Updated 9 years ago
- Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.☆34Updated 2 years ago
- A tool to generate UML class diagrams from JSON schema documents☆39Updated 5 years ago
- Make graphs you can play with... Web app in Flask and Bootstrap to fetch Zotero datasets and then create graph visualizations with d3.js☆22Updated 7 years ago
- The Sweble Wikitext Components module provides a parser for MediaWiki's wikitext and an engine trying to emulate the behavior of a MediaW…☆73Updated last year
- Core API for Silverpeas☆51Updated 2 weeks ago
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆87Updated 5 years ago
- 📘 A Citation Style Language (CSL) processor for Java.☆98Updated last week
- Tika-Similarity uses the Tika-Python package (Python port of Apache Tika) to compute file similarity based on Metadata features.☆108Updated 7 months ago
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆99Updated 3 years ago
- Gephi Toolkit - All Gephi in a Library☆179Updated last year
- Simple and lightweight time tracking for individuals and teams.☆116Updated 4 months ago
- Simple taxonomy management tool and document classifier.☆56Updated 5 years ago
- Common Crawl Index Server☆71Updated 8 months ago
- ImageCat is an Apache OODT RADIX application that uses Apache Solr, Apache Tika and Apache OODT to ingest 10s of millions of files (image…☆96Updated 7 years ago
- Cloudfier is a model-driven tool for rapid development of business applications☆22Updated 2 months ago
- Textricator is a tool to extract text from documents and generate structured data.☆350Updated 8 months ago
- Fast in-memory graph structure, powering Gephi☆74Updated 3 weeks ago