Living-with-machines / deduplify
A Python tool to search for and remove duplicated files in messy datasets
☆16Updated 4 months ago
Alternatives and similar repositories for deduplify:
Users that are interested in deduplify are comparing it to the libraries listed below
- MediaScape project researching the utility of Generous Interfaces for audiovisual archives☆10Updated 3 months ago
- OpenRefine reconciler for Research Organization Registry☆13Updated last month
- OpenAIRE Guidelines for Literature Repository Managers based on Dublin Core and DataCite Metadata Kernel☆13Updated last year
- Metadata Quality Assessment Framework API☆18Updated last week
- Heritage Connector: Transforming text into data to extract meaning and make connections☆24Updated 2 years ago
- Bagit-based data packaging specification for dissemination of research data with useful human and machine readable metadata: "Make Data C…☆39Updated 5 years ago
- Web application for distributed compute analysis of Archive-It web archive collections.☆18Updated last month
- Source code of BARTOC.org user interface☆25Updated last week
- A curated list of software, tools, resources and projects by and for libraries.☆16Updated 4 years ago
- Light-weight Linked Open Data native cataloguing and crowdsourcing platform☆18Updated 2 months ago
- Public-facing data for the US Archives RepoData project☆17Updated last year
- DEPRECATED - no longer actively maintained. Automated workflow for harvesting, transforming and indexing of metadata using metha, OpenRef…☆19Updated 5 years ago
- OpenRefine command-line interface written in Bash (💎+🤖). Supports batch processing (import, transform, export).☆16Updated 2 months ago
- ShEx interpreter for ShEx 2.0☆25Updated 2 years ago
- Instructions, exercises and example data sets for Annif hands-on tutorial☆40Updated this week
- An open source set of decks for learning about digital preservation.☆23Updated 5 years ago
- Omeka S module for describing resources using values and URIs from Wikidata☆21Updated 3 weeks ago
- Simple command line oai-pmh harvester written in Python.☆41Updated 2 years ago
- ☆28Updated 7 years ago
- Small Python library to validate persistent identifiers used in scholarly communication.☆29Updated last month
- DatAasee - A Metadata-Lake for Libraries☆14Updated 7 months ago
- Mario is a metadata processing pipeline that will process data from various sources and write to Elasticsearch☆13Updated 2 years ago
- A Data Parsing/Data Manipulation Tool Supporting Digitization Projects and Other Data Analysis Projects☆47Updated 5 years ago
- 💠 An index for linked open data & standard knowledge descriptions (ontologies, vocabularies, shapes, queries, mappings)☆42Updated last year
- Integrated CSV to RDF converter, using CSVW and nanopublications☆47Updated 11 months ago
- For working on the recipes☆40Updated this week
- A platform-agnostic, configurable, and brandable SPARQL editor and visualization interface.☆13Updated 3 weeks ago
- Web application to try out reconciliation services interactively☆13Updated last week
- Rails application with Blazegraph for managing controlled vocabularies in RDF.☆22Updated last year
- Library Carpentry: OpenRefine☆53Updated last week