Find duplicate text files.
☆15Jan 14, 2025Updated last year
Alternatives and similar repositories for dedup
Users that are interested in dedup are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Code for extracting parallel corpora from pmindia☆17Jan 28, 2020Updated 6 years ago
- 文档去重功能是为了解决搜索引擎的文档语义重复的问题,方法是多重哈希下的语义指纹算法。☆12Aug 17, 2013Updated 12 years ago
- Remove duplicate documents/videos/images via popular algorithms such as SimHash, SpotSig, Shingling, etc.☆19Aug 28, 2023Updated 2 years ago
- MemCachier Django usage example☆13Nov 29, 2018Updated 7 years ago
- Microsoft Translator Api wrapper☆13Feb 12, 2019Updated 7 years ago
- NordVPN Special Discount Offer • AdSave on top-rated NordVPN 1 or 2-year plans with secure browsing, privacy protection, and support for for all major platforms.
- Rabin hashing and content-defined chunking for Go☆20Sep 11, 2017Updated 8 years ago
- Various Java i18n tools, including tools for processing the Gettext and Properties formats☆16May 11, 2021Updated 4 years ago
- A Windows program to view/examine XLIFF file contents.☆14Sep 26, 2024Updated last year
- TAUS Dynamic Quality Framework API☆12Sep 17, 2020Updated 5 years ago
- uber_clone_with_flutter☆11Mar 18, 2022Updated 4 years ago
- Fast duplicate file detection library☆26Jan 5, 2017Updated 9 years ago
- A Postman collection of Azure Cognitive Services APIs.☆12Mar 16, 2022Updated 4 years ago
- Python package for deduplication/entity resolution using active learning☆83Aug 24, 2024Updated last year
- PHP Vulnerability Hunter (fork)☆13May 8, 2015Updated 10 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click and start building anything your business needs.
- Utility to list duplicate files in one or more directories based on the file contents☆24Sep 23, 2024Updated last year
- ☆13Apr 16, 2022Updated 3 years ago
- ☆10Nov 22, 2023Updated 2 years ago
- Simplified version of flashrom for installing new system firmware☆23Mar 10, 2023Updated 3 years ago
- An OSINT tool to find data leaks on a targeted website☆17Mar 30, 2021Updated 4 years ago
- Bash script to create an ebook from a list of web articles. Inspired by the now-defunct Readlists.org by Readability☆18Oct 13, 2019Updated 6 years ago
- My OpenCode and Oh-My-OpenCode configuration files with API proxy setup documentation☆33Jan 5, 2026Updated 2 months ago
- Code and data for Teddy https://arxiv.org/abs/2001.05171.☆15Jun 21, 2022Updated 3 years ago
- Script to download all xkcd comics using web scraping.☆10Aug 27, 2021Updated 4 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- Protect your sensitive HTML content with this AES encryption HTML loader. User will need to key in password in order to view the HTML con…☆11May 5, 2015Updated 10 years ago
- Repository for our paper "AbuseAnalyzer: Abuse Detection, Severity and Target Prediction for Gab Posts"☆11Jul 18, 2021Updated 4 years ago
- Doing style transfer with linguistic features using OpenAI's CLIP.☆14May 4, 2021Updated 4 years ago
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- Softphone using Twilio Client.☆21Mar 14, 2014Updated 12 years ago
- A minimal Bash framework and CLI tool that makes writing, sharing and using bash scripts easy☆13Updated this week
- Quick selection widget for Markdown notes, inspired by terminal_velocity☆13Jul 2, 2020Updated 5 years ago
- Dataset for Paper "Exploring Content Selection in Summarization of Novel Chapters"☆14Mar 20, 2023Updated 3 years ago
- Coursera Corpus Mining and Multistage Fine-Tuning for Improving Lectures Translation☆15Aug 27, 2024Updated last year
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆10Oct 15, 2020Updated 5 years ago
- ☆10Dec 3, 2020Updated 5 years ago
- Question generation from text☆15Sep 19, 2012Updated 13 years ago
- Transformer based Trigram Blocking implementation in Tensorflow☆11Feb 26, 2020Updated 6 years ago
- Asynchronous Bittorrent Client written in C☆16Feb 13, 2024Updated 2 years ago
- A JavaScript library for adding captioning to online videos. Also makes text transcript clickable, directing viewer to the point of the m…☆26Nov 24, 2011Updated 14 years ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆15Jun 6, 2023Updated 2 years ago