pawelrychlik / duplitectorLinks
A duplicate data detector engine PoC based on Elasticsearch.
☆20Updated 10 years ago
Alternatives and similar repositories for duplitector
Users that are interested in duplitector are comparing it to the libraries listed below
Sorting:
- A command line and Python client for Open-Spending☆10Updated 7 years ago
- common data interchange format for document processing pipelines that apply natural language processing tools to large streams of text☆35Updated 8 years ago
- FacetView is a pure javascript frontend for ElasticSearch.☆291Updated 10 years ago
- Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasti…☆24Updated 9 years ago
- Python interface for OrientDB binary Serialization☆10Updated 5 years ago
- files and code related to the Early Modern OCR Project (eMOP) at the IDHMC☆16Updated 10 years ago
- ScaleGraph is an X10 billion scale graph analysis library.☆21Updated 9 years ago
- European Parliament website Python scraper☆12Updated 8 years ago
- Handle linguistic corpus and convert it to use NLP tools☆20Updated 12 years ago
- ☆10Updated 10 years ago
- a pure javascript frontend for ElasticSearch search indices.☆80Updated 7 years ago
- memcached transport plugin for elasticsearch (STOPPED)☆34Updated 2 years ago
- A stemmer for Slovak language☆12Updated 8 years ago
- A multi-threaded Python based crawler making use of Splash to render JavaScript.☆10Updated 7 years ago
- code to remove "noise" from hOCR output of Tesseract OCR.☆14Updated 8 years ago
- This is where you can find all the code you need for the ELUNA 2019 Developers Day+ Alma Workshop.☆10Updated 6 years ago
- Una plataforma web para validar colaborativamente el escrutinio provisorio☆20Updated 3 years ago
- Curiosity is a generic frontend for facetting, displaying and editing data from any elasticsearch index.☆76Updated 9 years ago
- Tiny static site generator☆10Updated 2 years ago
- A Relaxed Schema Graph Database Management System☆53Updated 5 years ago
- Tools for TICCL☆14Updated last month
- An attempt at creating a silver/gold standard dataset for backtesting yesterday & today's content-extractors☆35Updated 10 years ago
- An extension to the demo template of ElasticUI a beautiful AngularJS frontend to ElasticSearch for faceted navigation☆39Updated 10 years ago
- Brand disambiguator for tweets to differentiate e.g. Orange vs orange (brand vs foodstuff), using NLTK and scikit-learn☆57Updated 12 years ago
- Facilitates the indexing of content from a CSV into ElasticSearch☆26Updated 11 years ago
- Docker container to provide Apache Tika RESTful API☆41Updated 9 years ago
- Term List Matching Plugin for ElasticSearch☆26Updated 11 years ago
- Semanticizest: dump parser and client☆20Updated 9 years ago
- A semantic analysis tool to generate synonym.txt files for Solr. [RETIRED]☆24Updated 8 years ago
- mltk - Moz Language Tool Kit☆12Updated 10 years ago