Find duplicate text files.
☆14Jan 14, 2025Updated last year
Alternatives and similar repositories for dedup
Users that are interested in dedup are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Deduplication for cfDNA sequencing data☆11Jul 5, 2017Updated 8 years ago
- Code for extracting parallel corpora from pmindia☆17Jan 28, 2020Updated 6 years ago
- Pile Deduplication Code☆18May 15, 2023Updated 2 years ago
- Find near-duplicate documents using minhashing implemented in Go.☆16Dec 22, 2015Updated 10 years ago
- Microsoft Translator Api wrapper☆12Feb 12, 2019Updated 7 years ago
- AI Agents on DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- 🕹️ Group and deduplicate concurrent tasks☆30Updated this week
- Rabin hashing and content-defined chunking for Go☆20Sep 11, 2017Updated 8 years ago
- Various Java i18n tools, including tools for processing the Gettext and Properties formats☆15May 11, 2021Updated 4 years ago
- TAUS Dynamic Quality Framework API☆11Sep 17, 2020Updated 5 years ago
- Python library and dashboard for hyperparameter search and model training for computer vision tasks based on PyTorch, Optuna, FiftyOne, D…☆17Jul 14, 2023Updated 2 years ago
- Flickr Follower Bot : Bot for Flickr, in .Net Core, using a Chrome client and Selenium for command it☆12Mar 7, 2021Updated 5 years ago
- Content Defined Chunking playground☆50Mar 26, 2026Updated last month
- uber_clone_with_flutter☆10Mar 18, 2022Updated 4 years ago
- ✨ Epris is a JavaScript library that simplifies interface development☆26May 30, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- The API is for anyone who wants to adopt best practices for a translation services API to interact with counterparts directly from your a…☆11Dec 18, 2016Updated 9 years ago
- Cross-browser wrapper for window.console object.☆14Apr 14, 2015Updated 11 years ago
- Short text similarity matching model based on deep learning and machine learning☆15Jan 9, 2019Updated 7 years ago
- The Project is a sample project we ask prospective consultants to complete as part of their interview process.☆10May 29, 2016Updated 9 years ago
- Python package for deduplication/entity resolution using active learning☆82Aug 24, 2024Updated last year
- Multiple ways of chunking for data deduplication: Fixed size chunking, Content defined chunking, and File based chunking.☆19Dec 20, 2013Updated 12 years ago
- PyTorch - Albert Large V2, Bert Base Uncased, Bert Large Uncased WWM Finetuned Squad, Distil Roberta Base, Roberta Base Squad2, Roberta l…☆11Jul 10, 2020Updated 5 years ago
- ☆13Apr 16, 2022Updated 4 years ago
- ☆10Nov 22, 2023Updated 2 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- If you want a quick and dirty way to programmatically meta descriptions at scale using Python, this is the tutorial for you. Jupyter note…☆20Sep 1, 2021Updated 4 years ago
- The Resource Static Analysis enables companies and localization suppliers to quickly add scalable validation checks to help ensure qualit…☆18Nov 28, 2022Updated 3 years ago
- Experimental command suggestion system based on historical usage of commands in certain locations.☆12Feb 18, 2026Updated 2 months ago
- Crawling engine that crawls a set of top-level domains looking for documents in a list of languages☆11Feb 6, 2024Updated 2 years ago
- My OpenCode and Oh-My-OpenCode configuration files with API proxy setup documentation☆36Jan 5, 2026Updated 4 months ago
- A Hindi Image Captioning system made completely with Transformers🤗☆10Apr 16, 2024Updated 2 years ago
- Script to download all xkcd comics using web scraping.☆10Aug 27, 2021Updated 4 years ago
- Reading list for multimodal sequence learning☆14Sep 4, 2023Updated 2 years ago
- Softphone using Twilio Client.☆20Mar 14, 2014Updated 12 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Simple and useful time tracker. Collects tasks and works (timeslots) in hierarchical tree. Has: reports (based on xslt templates), locali…☆16Nov 6, 2021Updated 4 years ago
- Provides syntax highlighting for Apptainer/Singularity definition files.☆10Dec 24, 2025Updated 4 months ago
- ☆10Dec 3, 2020Updated 5 years ago
- Question generation from text☆15Sep 19, 2012Updated 13 years ago
- Transformer based Trigram Blocking implementation in Tensorflow☆11Feb 26, 2020Updated 6 years ago
- Super simple, zero config options, <2kb declarative tooltip library with no dependencies.☆17Jun 2, 2023Updated 2 years ago
- This repository contains the implementation of the paper: "Span Classification with Structured Information for Disfluency Detection in Sp…☆15Jun 6, 2023Updated 2 years ago