Detecting near duplicates usign Moses Charikars Algorithm
☆20Apr 27, 2026Updated last month
Alternatives and similar repositories for charikars_algorithm
Users that are interested in charikars_algorithm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- Near-Duplicate Detection in Python.☆25Jul 23, 2021Updated 4 years ago
- Latent Dirichlet Allocation on tweets☆15May 17, 2015Updated 11 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆16Jun 10, 2021Updated 5 years ago
- ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analys…☆12Jan 26, 2022Updated 4 years ago
- Object Resource Stream and CDXJ Drafts☆15Nov 28, 2018Updated 7 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- Project developed during internship at MITU Skillologies for summarizing news articles in the form of Topic Models.☆14Jul 3, 2019Updated 6 years ago
- ☆13Sep 13, 2015Updated 10 years ago
- ☆14Dec 9, 2014Updated 11 years ago
- Migrate your active record migrations to ecto compatible migrations☆13Jun 7, 2015Updated 11 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- ☆11Nov 21, 2025Updated 6 months ago
- Commands and snippets for faster Javascript with Atom☆11Aug 26, 2016Updated 9 years ago
- Data and experiments with world population densities for comparison to addresses☆12Mar 15, 2017Updated 9 years ago
- Sources is a web application that allows your team to store, manage and annotate your sources and to make them easily available to your r…☆13Jun 2, 2022Updated 4 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Code from Bellingcat's guide☆11Dec 8, 2022Updated 3 years ago
- ☆16Feb 25, 2020Updated 6 years ago
- TREC evaluation demonstration/Query Expansion module for Lucene for a lecture on Information Retrieval; About parsing the TREC 10G datase…☆21Nov 17, 2015Updated 10 years ago
- Sentiment analysis with PredictionIO and CML☆12Jul 8, 2015Updated 10 years ago
- JSON with biographical and political data of Austrian Members of Parliament (Nationalrat/first Chamber) since 1920.☆13Dec 23, 2021Updated 4 years ago
- Decoders for weather sensor data from RTL SDR.☆18Apr 27, 2025Updated last year
- Google Coral containers☆12Apr 28, 2022Updated 4 years ago
- An open relation extraction system☆48Nov 23, 2021Updated 4 years ago
- ☆13Jan 20, 2021Updated 5 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- Exploration of Health-Related Tweets through Topic Modeling & Sentiment Analysis☆20Apr 17, 2024Updated 2 years ago
- This repository provides German documentation relating to the text recognition software Tesseract. The documentation was created in the c…☆15Sep 6, 2022Updated 3 years ago
- Create and edit WARC and WACZ files☆29Dec 6, 2024Updated last year
- A PyTorch implementation of Fader Networks: Manipulating Images by Sliding Attributes by Lample et al.☆12Aug 27, 2017Updated 8 years ago
- This repository contains the Arabic sarcasm dataset (ArSarcasm)☆29Feb 18, 2021Updated 5 years ago
- R tools for GDELT and the Global Knowledge Graph☆14Jan 10, 2014Updated 12 years ago
- Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasti…☆24Jul 5, 2016Updated 9 years ago
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- GeoIP Target Redirection and Target Filter Redirector using GeoIP API (JS)☆22Oct 14, 2020Updated 5 years ago
- GPU virtual machines on DigitalOcean Gradient AI • AdGet to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
- Workshop on Noisy User-generated Text (W-NUT)☆31Updated this week
- Digs into Dicts (lists and tuples)☆15Jun 23, 2015Updated 10 years ago
- A growing collection of generative art projects☆14Oct 8, 2019Updated 6 years ago
- A distributed image crawler☆19Feb 9, 2015Updated 11 years ago
- Realtime currency conversion for Elixir☆22Apr 2, 2016Updated 10 years ago
- A Theano implementation of a CNN DSEBM (deep structured energy-based model) described in https://arxiv.org/pdf/1605.07717v2.pdf☆10Oct 13, 2016Updated 9 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆207Sep 14, 2013Updated 12 years ago