Detecting near duplicates usign Moses Charikars Algorithm
☆20Oct 7, 2014Updated 11 years ago
Alternatives and similar repositories for charikars_algorithm
Users that are interested in charikars_algorithm are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- KERT: Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents☆10Aug 31, 2015Updated 10 years ago
- The Tweets2013 Internet Archive collection☆10Aug 7, 2020Updated 5 years ago
- Latent Dirichlet Allocation on tweets☆15May 17, 2015Updated 10 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆16Jun 10, 2021Updated 4 years ago
- ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analys…☆12Jan 26, 2022Updated 4 years ago
- Managed Kubernetes at scale on DigitalOcean • AdDigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
- Twitter event detection and location inference tools, built for my dissertation.☆12Nov 22, 2022Updated 3 years ago
- Plot tree based machine learning models☆11Oct 11, 2024Updated last year
- Object Resource Stream and CDXJ Drafts☆15Nov 28, 2018Updated 7 years ago
- ☆14Mar 7, 2019Updated 7 years ago
- Near-duplicate detection tool☆24Nov 27, 2016Updated 9 years ago
- ☆13Sep 13, 2015Updated 10 years ago
- ☆14Dec 9, 2014Updated 11 years ago
- Migrate your active record migrations to ecto compatible migrations☆13Jun 7, 2015Updated 10 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Open source password manager - Proton Pass • AdSecurely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
- Sources is a web application that allows your team to store, manage and annotate your sources and to make them easily available to your r…☆13Jun 2, 2022Updated 3 years ago
- Mention-anomaly-based event detection and tracking in Twitter☆17Sep 28, 2016Updated 9 years ago
- Code from Bellingcat's guide☆11Dec 8, 2022Updated 3 years ago
- TREC evaluation demonstration/Query Expansion module for Lucene for a lecture on Information Retrieval; About parsing the TREC 10G datase…☆21Nov 17, 2015Updated 10 years ago
- Sentiment analysis with PredictionIO and CML☆12Jul 8, 2015Updated 10 years ago
- JSON with biographical and political data of Austrian Members of Parliament (Nationalrat/first Chamber) since 1920.☆13Dec 23, 2021Updated 4 years ago
- Decoders for weather sensor data from RTL SDR.☆18Apr 27, 2025Updated 11 months ago
- Google Coral containers☆12Apr 28, 2022Updated 3 years ago
- An open relation extraction system☆47Nov 23, 2021Updated 4 years ago
- DigitalOcean Gradient AI Platform • AdBuild production-ready AI agents using customizable tools or access multiple LLMs through a single endpoint. Create custom knowledge bases or connect external data.
- ☆13Jan 20, 2021Updated 5 years ago
- A tool for collection archival slivers of the web and web archives☆17Feb 18, 2025Updated last year
- Exploration of Health-Related Tweets through Topic Modeling & Sentiment Analysis☆20Apr 17, 2024Updated last year
- Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)☆23Jul 14, 2018Updated 7 years ago
- Create and edit WARC and WACZ files☆25Dec 6, 2024Updated last year
- A PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.☆51Oct 23, 2014Updated 11 years ago
- R tools for GDELT and the Global Knowledge Graph☆14Jan 10, 2014Updated 12 years ago
- Seeder - Czech webarchive curating tool and public site☆17Feb 12, 2026Updated last month
- ... just because nltk is too heavy☆35Jul 21, 2010Updated 15 years ago
- NordVPN Threat Protection Pro™ • AdTake your cybersecurity to the next level. Block phishing, malware, trackers, and ads. Lightweight app that works with all browsers.
- Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasti…☆24Jul 5, 2016Updated 9 years ago
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- A simple clock widget for Übersicht☆20Apr 15, 2019Updated 6 years ago
- Hadoop YARN & MapReduce Memory Calculator☆13Nov 9, 2015Updated 10 years ago
- Open-source Chrome extension for injecting and overriding HTTP request headers☆15Jul 4, 2024Updated last year
- Extracts plain text, language identification and more metadata from WARC records☆23Oct 1, 2025Updated 6 months ago
- Workshop on Noisy User-generated Text (W-NUT)☆31Mar 3, 2026Updated last month