Detecting near duplicates usign Moses Charikars Algorithm
☆20Apr 27, 2026Updated 3 weeks ago
Alternatives and similar repositories for charikars_algorithm
Users that are interested in charikars_algorithm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- KERT: Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents☆10Aug 31, 2015Updated 10 years ago
- Near-Duplicate Detection in Python.☆25Jul 23, 2021Updated 4 years ago
- The Tweets2013 Internet Archive collection☆10Aug 7, 2020Updated 5 years ago
- regex powered yank+substitute☆13Oct 23, 2017Updated 8 years ago
- A pure-Python count-min sketch, fast and accurate.☆16May 21, 2017Updated 9 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- Plot tree based machine learning models☆11Oct 11, 2024Updated last year
- Near-duplicate detection tool☆24Nov 27, 2016Updated 9 years ago
- Project developed during internship at MITU Skillologies for summarizing news articles in the form of Topic Models.☆14Jul 3, 2019Updated 6 years ago
- Arabic To English translation using transformer neural nets.☆15Mar 15, 2019Updated 7 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Data and experiments with world population densities for comparison to addresses☆12Mar 15, 2017Updated 9 years ago
- ☆11May 23, 2019Updated 6 years ago
- Exploration of Health-Related Tweets through Topic Modeling & Sentiment Analysis☆20Apr 17, 2024Updated 2 years ago
- Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)☆23Jul 14, 2018Updated 7 years ago
- End-to-end encrypted cloud storage - Proton Drive • AdSpecial offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
- source code of bison☆26Jul 20, 2020Updated 5 years ago
- R tools for GDELT and the Global Knowledge Graph☆14Jan 10, 2014Updated 12 years ago
- Find duplicate text files.☆14Jan 14, 2025Updated last year
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- A simple clock widget for Übersicht☆20Apr 15, 2019Updated 7 years ago
- Workshop on Noisy User-generated Text (W-NUT)☆31Apr 24, 2026Updated 3 weeks ago
- Digs into Dicts (lists and tuples)☆15Jun 23, 2015Updated 10 years ago
- CrisisLex: Your data and lexical resource in crises☆52Jan 25, 2024Updated 2 years ago
- A Theano implementation of a CNN DSEBM (deep structured energy-based model) described in https://arxiv.org/pdf/1605.07717v2.pdf☆10Oct 13, 2016Updated 9 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- collection of modules to build distributed and reliable concurrent systems in Python.☆207Sep 14, 2013Updated 12 years ago
- Cablemap - WikiLeaks Cablegate parser and Topic Maps converter☆15Dec 1, 2015Updated 10 years ago
- Python wrapper for the FrameNet library.☆24Jul 26, 2011Updated 14 years ago
- Sense Disambiguation of Connectives for PDTB-Style Discourse Parsing☆14Jan 13, 2017Updated 9 years ago
- Mapping Wikileaks' Cablegate thematics using Python, MongoDB and Gephi☆17Nov 9, 2018Updated 7 years ago
- Multi-Perspective Relevance Matching with Hierarchical ConvNets for Social Media Search (Rao et al. AAAI'19)☆27Nov 21, 2022Updated 3 years ago
- SIGIR 2017: Embedding-based query expansion for weighted sequential dependence retrieval model☆36Aug 2, 2017Updated 8 years ago
- neovim python plugin framework☆13Aug 31, 2020Updated 5 years ago
- Natural language hashing library.☆10Nov 24, 2014Updated 11 years ago
- Managed hosting for WordPress and PHP on Cloudways • AdManaged hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
- WhatsApp statistics toolkit mirror☆10Mar 24, 2019Updated 7 years ago
- run multiple shell commands in parallel and coordinate their output☆31Jul 5, 2012Updated 13 years ago
- Introduction Notebook to Extreme Multi-Label Classification problem (XML)☆22Sep 9, 2018Updated 7 years ago
- Using latent Dirichlet allocation (LDA) in Apache Lucene☆57Nov 19, 2012Updated 13 years ago
- Machine Learning solution for Kaggle.com's "Partly Sunny with a Chance of Hashtags"☆27Dec 6, 2013Updated 12 years ago
- A command line utility for pattern matching similar to 'grep', but supports capture groups and multiline matches.☆19May 25, 2012Updated 13 years ago
- Triton/Manta DNS server over Apache Zookeeper☆25May 13, 2026Updated last week