pawelrychlik / duplitectorLinks
A duplicate data detector engine PoC based on Elasticsearch.
☆20Updated 10 years ago
Alternatives and similar repositories for duplitector
Users that are interested in duplitector are comparing it to the libraries listed below
Sorting:
- A command line and Python client for Open-Spending☆10Updated 7 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- Google Drive river for Elasticsearch☆20Updated 10 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago
- Verteego Data Suite☆10Updated 8 years ago
- memcached transport plugin for elasticsearch (STOPPED)☆34Updated 2 years ago
- Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasti…☆24Updated 9 years ago
- Python interface for OrientDB binary Serialization☆10Updated 5 years ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- Handle linguistic corpus and convert it to use NLP tools☆20Updated 12 years ago
- A backend service for the Push-Android app to connect and pull data from.☆10Updated 2 years ago
- Term List Matching Plugin for ElasticSearch☆26Updated 11 years ago
- Tools for scraping the Thomson Reuters (aka ISI) Web of Science☆20Updated 10 years ago
- European Parliament website Python scraper☆12Updated 8 years ago
- Next-gen web application for public finance data warehouses, formerly OpenSpending☆57Updated 3 years ago
- Detective.io is a platform that hosts your investigation and lets you make powerful queries to mine it. Simply describe your field of stu…☆136Updated 10 years ago
- Brand disambiguator for tweets to differentiate e.g. Orange vs orange (brand vs foodstuff), using NLTK and scikit-learn☆57Updated 12 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- WordNet RDF export☆24Updated 8 years ago
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆35Updated 10 years ago
- A command line utility for generating thumbnails, resizing images, and uploading images to Amazon S3☆10Updated 9 years ago
- Curiosity is a generic frontend for facetting, displaying and editing data from any elasticsearch index.☆75Updated 9 years ago
- A bundle of useful Elasticsearch plugins☆112Updated last year
- An extension to the demo template of ElasticUI a beautiful AngularJS frontend to ElasticSearch for faceted navigation☆39Updated 10 years ago
- This is where you can find all the code you need for the ELUNA 2019 Developers Day+ Alma Workshop.☆10Updated 6 years ago
- fSphinx easily builds faceted search systems using Sphinx.☆69Updated 12 years ago
- Programmatic building of JSON schemas (document and field mappings) with validation.☆31Updated 4 years ago
- The first Open Source document analysis platform☆65Updated 4 years ago
- List of tools and Utilities for Data and Information Visualization. Ever Expanding list with Insights into some of the most happening Fra…☆39Updated 10 years ago
- Simple Hungarian Sentence Analysis with NLTK☆16Updated 4 years ago