WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆43Updated 4 years ago
Alternatives and similar repositories for pdfindexer:
Users that are interested in pdfindexer are comparing it to the libraries listed below
- ☆38Updated 9 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 7 years ago
- Cloudfier is a model-driven tool for rapid development of business applications☆22Updated 2 months ago
- A course on free/libre and open source software☆10Updated last year
- Netarchivesuite development☆19Updated 3 weeks ago
- Core API for Silverpeas☆49Updated last week
- resource scheduling and event planing☆63Updated 3 weeks ago
- Babel Street Analytics Client Library for Java☆11Updated last week
- Python bindings for Neo4j☆26Updated 10 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆45Updated 3 years ago
- CiteSeerX public repository☆132Updated 10 months ago
- Common web archive utility code.☆55Updated last month
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆65Updated 8 years ago
- ☆49Updated 8 years ago
- WARC (Web Archive) Input and Output Formats for Hadoop☆35Updated 10 years ago
- Blazegraph Tinkerpop3 Implementation☆61Updated 4 years ago
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 7 years ago
- CLI implementation of httpreserve that can test links and retrieve internet archive replacements☆10Updated 5 months ago
- Quick demos using the Toolkit☆94Updated 2 years ago
- Advanced similarity and duplicate source code proof of concept for our research efforts.☆52Updated 2 years ago
- Installer for Thymeflow, a personal knowledge management system.☆33Updated 7 years ago
- Tool for visualizing hOCR output from Tesseract (or other OCR engines that support hOCR).☆23Updated 10 years ago
- Provenance: Linking and Understanding Sources☆17Updated 11 months ago
- Java library to interface with OpenML☆10Updated 6 months ago
- Unilexicon: Taxonomy editor and tagging suite☆2Updated last month
- Quick starts for Teiid WildFly☆25Updated 6 years ago
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆268Updated 2 years ago
- Elwha is a Java application for monitoring topics, sentiment and events on Twitter streams with the ability to generate notification mess…☆16Updated 9 years ago
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆22Updated 7 months ago
- A library to store metadata of relational databases including the schema, statistics, and integrity constraints.☆25Updated 6 years ago