yahoo / tagchowder
Parsing and extracting information from (possibly malformed) HTML/XML documents
☆10Updated 11 months ago
Alternatives and similar repositories for tagchowder:
Users that are interested in tagchowder are comparing it to the libraries listed below
- DuraCloud open source project☆16Updated 3 weeks ago
- Solr Relevance Ranking Analysis and Visualization Tool☆17Updated 5 years ago
- A framework to allow the matching of string entities using customised sets of transformations and matchers, plus a tool to produce the ne…☆31Updated 7 years ago
- Common web archive utility code.☆55Updated 3 weeks ago
- DBpedia, which frequently crawls and analyses over 120 Wikipedia language editions has near complete information about (1) which facts ar…☆11Updated 2 years ago
- Mirror of Apache OpenNLP Add-ons☆17Updated this week
- This is a Fact based Question Answering System using Apache Solr as backend search engine, Wikipedia dumps as information source, Apache …☆26Updated 2 years ago
- SKOS Support for Apache Lucene and Solr☆56Updated 3 years ago
- Fcrepo4 webapp plus optional fcrepo dependencies☆13Updated 4 years ago
- Translation of query languages to serialized KoralQuery protocol☆11Updated this week
- An HTTP proxy for Elasticsearch, Solr (etc.) to prevent a 100% full disk situation.☆11Updated 6 years ago
- Search engine for structured data☆23Updated 3 weeks ago
- A toolkit for clustering web pages based on various similarity measures.☆33Updated 3 years ago
- Highly performant, lightweight framework for linked data processing. Supports RDFa, JSON-LD, RDF/XML and plain text formats, runs on Andr…☆52Updated 2 years ago
- Example SPARQL queries, mostly for working with ZBW data sets☆16Updated 7 months ago
- Extract Data from Wikipedia Lists☆31Updated 7 years ago
- DKPro C4CorpusTools is a collection of tools for processing CommonCrawl corpus, including Creative Commons license detection, boilerplate…☆52Updated 4 years ago
- Wikidata authority file mapping tool☆11Updated 6 years ago
- Automatic tagging and analysis of documents in an Apache Solr index for faceted search by RDF(S) Ontologies & SKOS thesauri☆47Updated 3 years ago
- Virtual patent marking crawler at iproduct.epfl.ch☆14Updated 7 years ago
- Zulia Search Engine☆32Updated 2 weeks ago
- Small scripts for processing Solr files☆10Updated last year
- A library to implement event-sourcing microservices☆16Updated this week
- Text conversion tool (from e.g. Word, HTML, txt) to corpus formats TEI or FoLiA)☆23Updated 3 years ago
- A tool for calculation semantic similarity between words from a text corpus based on lexico-syntactic patterns.☆27Updated 9 years ago
- Specification of Document Availability Information (DAIA)☆18Updated 7 years ago
- Europeana Cloud is Europeana’s new cloud-based infrastructure for storing and sharing cultural heritage data. It is currently in internal…☆26Updated last month
- This is the ETL lib package. It provides an API to munge and prepare JSON, TSV and other data using Apache Tika and JSON parsing/loading …☆17Updated last year
- A smart distributed crawler that infers navigation models of structured websites, used to cluster pages based on their structure and extr…☆9Updated 4 years ago
- Promoss Topic Modelling Toolbox☆11Updated 6 years ago