WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 4 years ago
Alternatives and similar repositories for pdfindexer:
Users that are interested in pdfindexer are comparing it to the libraries listed below
- ☆36Updated 9 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆25Updated 7 years ago
- Quickly analyze and explore email with advanced analytics and visualization.☆56Updated 3 years ago
- ☆25Updated 9 years ago
- Comparison of BMPN tools☆43Updated 2 weeks ago
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- A library for extracting tables from PDF files☆90Updated 11 years ago
- Grok is simple tool that allows you to easily parse logs☆38Updated 10 years ago
- Text Mining Library with a focus on Latent Semantic Analysis☆12Updated 11 years ago
- Detect memory leaks in minutes without a heap dump.☆17Updated 7 years ago
- Your "yellow pages" of Enterprise Free Software Publishers, their products and success cases☆17Updated 7 months ago
- A Java library for working with Frictionless Data Data Packages.☆21Updated last year
- High-performance, portable and configurable desktop search application / information retrieval system☆28Updated 4 years ago
- Quick demos using the Toolkit☆93Updated 2 years ago
- Cytoscape 3 desktop version.☆17Updated 3 months ago
- ☆20Updated 7 years ago
- Parse wikipedia dumps and index (some) page data to elasticsearch☆49Updated 9 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- A java library for creating standalone, portable, schema-full object databases supporting pagination and faceted search, and offering str…☆16Updated 7 years ago
- ☆49Updated 7 years ago
- 📘 A Citation Style Language (CSL) processor for Java.☆90Updated 2 months ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- Java port of TLSH (Trend Micro Locality Sensitive Hash)☆20Updated 3 years ago
- Cloudfier is a model-driven tool for rapid development of business applications☆22Updated 3 weeks ago
- A curated list of Awesome Apache Solr links and resources.☆107Updated 3 years ago
- Angular JS Solr and Elasticsearch and OpenSearch Diagnostic Search Services☆26Updated 8 months ago
- OptaPlanner workbench 7.x: OptaPlanner extensions to the KIE Workbench☆24Updated last year
- scraper related helper functions☆27Updated 10 years ago
- Pre-trained models for Datumbox Machine Learning Framework.☆15Updated 3 years ago
- Open Source, Distributed, Big Data Enterprise Search Engine☆69Updated this week