WolfgangFahl / pdfindexer
Index and search PDF files using Apache Lucene and PDF Box
☆41Updated 3 years ago
Related projects: ⓘ
- Cloudfier is a model-driven tool for rapid development of business applications☆22Updated last week
- an open-source data management platform for knowledge workers (https://github.com/dswarm/dswarm-documentation/wiki)☆54Updated 6 years ago
- An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)☆24Updated 6 years ago
- Norconex Filesystem Collector is a flexible crawler for collecting, parsing, and manipulating data ranging from local hard drives to netw…☆21Updated last year
- Open Semantic Visual Linked Data Graph Explorer: Open Source tool (web app) and user interace (UI) for discovery, exploration and visuali…☆77Updated 4 years ago
- A toolkit for clustering web pages based on various similarity measures.☆32Updated 2 years ago
- Suite of tools for detecting changes in web pages and their rendering☆53Updated 9 months ago
- A repo that contains outgoing links from DBpedia☆50Updated 4 years ago
- Common web archive utility code.☆50Updated last week
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆45Updated 2 years ago
- Unilexicon: Taxonomy editor and tagging suite☆0Updated last month
- Apache UIMA Java SDK☆64Updated this week
- A library of examples showing how to use the Common Crawl corpus (2008-2012, ARC format)☆64Updated 8 years ago
- Quick demos using the Toolkit☆95Updated last year
- Python based Open Source ETL tools for file crawling, document processing (text extraction, OCR), content analysis (Entity Extraction & N…☆254Updated last year
- Python/Django based webapps and web user interfaces for search, structure (meta data management like thesaurus, ontologies, annotations a…☆95Updated last year
- RDF store on a cloud-based architecture (previously on https://code.google.com/p/cumulusrdf)☆31Updated 8 years ago
- This repository contains the Domain Discovery Tool (DDT) project. DDT is an interactive system that helps users explore and better unders…☆46Updated 2 years ago
- Mirror of Apache Taverna Engine (incubating)☆16Updated last year
- SOLR bulk indexing utility for the command line.☆45Updated last month
- Tools for exploring the contents of web archive files.☆39Updated 3 years ago
- JSONiq tutorial☆44Updated 2 years ago
- PDF Extraction Toolkit☆41Updated 3 years ago
- A machine learning software for extracting information from scholarly documents☆23Updated 3 years ago
- Spring integration with Stardog RDF database☆17Updated 2 years ago
- A web tool enabling authorship and download of RDF, and RDF visualization in Linked Open Data☆38Updated 5 years ago
- Extract statistics from Wikipedia Dump files.☆26Updated 3 years ago
- Blazegraph Tinkerpop3 Implementation☆59Updated 3 years ago
- Java based GraphViz HTTP Server☆36Updated last year