zyocum/dedup

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/zyocum/dedup)

zyocum / dedup

Find duplicate text files.

☆14

Alternatives and similar repositories for dedup

Users that are interested in dedup are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

bhaddow / pmindia-crawler
View on GitHub
Code for extracting parallel corpora from pmindia
☆17Jan 28, 2020Updated 6 years ago
lkarlslund / stringdedup
View on GitHub
String deduplication package for Go
☆19Jan 10, 2024Updated 2 years ago
memcachier / examples-django
View on GitHub
MemCachier Django usage example
☆11Nov 29, 2018Updated 7 years ago
strakergroup / ltk-filesystem-connector
View on GitHub
The Lingotek Filesystem Connector (ltk) links your files and folders to the Translation Network™
☆13May 7, 2026Updated 2 months ago
smf33 / FlickrFollowerBot
View on GitHub
Flickr Follower Bot : Bot for Flickr, in .Net Core, using a Chrome client and Selenium for command it
☆12Mar 7, 2021Updated 5 years ago
Simple, predictable pricing with DigitalOcean hosting • Ad
Always know what you'll pay with monthly caps and flat pricing. Enterprise-grade infrastructure trusted by 600k+ customers.
AlissaSabre / disfr
View on GitHub
A Windows program to view/examine XLIFF file contents.
☆14Sep 26, 2024Updated last year
zanata / tennera
View on GitHub
Various Java i18n tools, including tools for processing the Gettext and Properties formats
☆15May 11, 2021Updated 5 years ago
heroku / log2viz
View on GitHub
DEFUNCT: Realtime analysis of your Heroku app logs.
☆140Oct 15, 2024Updated last year
hadley / beautiful-data
View on GitHub
Book chapter for beautiful data
☆15Jan 17, 2009Updated 17 years ago
ronsavage / Regexp-Assemble
View on GitHub
Assemble multiple Regular Expressions into a single RE
☆15Nov 24, 2023Updated 2 years ago
seanmiller802 / webRTC-phone
View on GitHub
A fully featured soft-phone built with Plivo's webRTC Browser SDK
☆17Sep 6, 2018Updated 7 years ago
meedan / alegre
View on GitHub
A text and media analysis service for Meedan Check, a collaborative media annotation platform
☆15Jun 1, 2026Updated last month
apalle1 / Sentiment-Span-Extraction-Using-Transformer-Models
View on GitHub
PyTorch - Albert Large V2, Bert Base Uncased, Bert Large Uncased WWM Finetuned Squad, Distil Roberta Base, Roberta Base Squad2, Roberta l…
☆11Jul 10, 2020Updated 6 years ago
nistvan86 / continuedev-llamacpp-gpu-llm-server
View on GitHub
☆10Nov 22, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
atrettel / wosp
View on GitHub
Wosp: advanced full-text search on the command line
☆16Dec 27, 2025Updated 6 months ago
MX-Linux / mx-docs
View on GitHub
Documentation for MX Linux
☆12Jan 17, 2026Updated 6 months ago
hltcoe / gazetteer-collection
View on GitHub
☆12Mar 31, 2020Updated 6 years ago
samamoateng / Goalkicker
View on GitHub
Free programming language books
☆10Jun 4, 2020Updated 6 years ago
gfaceless / srt2txt
View on GitHub
convert subtitles to raw text
☆10Nov 23, 2016Updated 9 years ago
manestay / novel-chapter-dataset
View on GitHub
Dataset for Paper "Exploring Content Selection in Summarization of Novel Chapters"
☆13Mar 20, 2023Updated 3 years ago
GochoMugo / msu
View on GitHub
A minimal Bash framework and CLI tool that makes writing, sharing and using bash scripts easy
☆12Apr 23, 2026Updated 3 months ago
aperezdc / zsh-notes
View on GitHub
Quick selection widget for Markdown notes, inspired by terminal_velocity
☆13Jul 2, 2020Updated 6 years ago
onnovalkering / vscode-singularity
View on GitHub
Provides syntax highlighting for Apptainer/Singularity definition files.
☆10Dec 24, 2025Updated 7 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
niallo / Unworkable
View on GitHub
Asynchronous Bittorrent Client written in C
☆16Feb 13, 2024Updated 2 years ago
mohit3011 / Online-Antisemitism-Detection-Using-MultimodalDeep-Learning
View on GitHub
Repository for our paper “Subverting the Jewtocracy”: Online Antisemitism Detection Using MultimodalDeep Learning
☆12Apr 29, 2022Updated 4 years ago
TimeSlotTracker / timeslottracker-desktop
View on GitHub
Simple and useful time tracker. Collects tasks and works (timeslots) in hierarchical tree. Has: reports (based on xslt templates), locali…
☆16Nov 6, 2021Updated 4 years ago
pipwerks / EasyCaptions
View on GitHub
A JavaScript library for adding captioning to online videos. Also makes text transcript clickable, directing viewer to the point of the m…
☆25Nov 24, 2011Updated 14 years ago
domyounglee / TF-TrigramBlocking-transformer
View on GitHub
Transformer based Trigram Blocking implementation in Tensorflow
☆11Feb 26, 2020Updated 6 years ago
quanganhdo / anthology
View on GitHub
Bash script to create an ebook from a list of web articles. Inspired by the now-defunct Readlists.org by Readability
☆18Oct 13, 2019Updated 6 years ago
mblitton / GPT-Sentiment-Bot
View on GitHub
☆20Apr 24, 2023Updated 3 years ago
Sreyan88 / Disfluency-Detection-with-Span-Classification
View on GitHub
This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…
☆14Jun 6, 2023Updated 3 years ago
AGoodId / django-s3-collectstatic
View on GitHub
☆24Mar 26, 2013Updated 13 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
ilinguistics / common_crawl_corpus
View on GitHub
Scripts for building a geo-located web corpus using Common Crawl data
☆11Jan 18, 2026Updated 6 months ago
EvgenyKashin / random-colabs
View on GitHub
☆14Feb 24, 2021Updated 5 years ago
lordvlad / clock
View on GitHub
☆12Jan 19, 2026Updated 6 months ago
lisehr / dq-meerkat
View on GitHub
Automated Continuous Data Quality Measurement
☆12Nov 15, 2023Updated 2 years ago
aurooj / MMFT-BERT
View on GitHub
☆14Jun 29, 2024Updated 2 years ago
go-air / dupi
View on GitHub
A tool to find all duplicates in large sets of text documents.
☆16Sep 29, 2021Updated 4 years ago
NEFSC / READ-PSB-LWT-narwss_rwsas_apps
View on GitHub
☆13Jul 6, 2026Updated 2 weeks ago