Detecting near duplicates usign Moses Charikars Algorithm
☆20Oct 7, 2014Updated 11 years ago
Alternatives and similar repositories for charikars_algorithm
Users that are interested in charikars_algorithm are comparing it to the libraries listed below
Sorting:
- KERT: Automatic Construction and Ranking of Topical Keyphrases on Collections of Short Documents☆10Aug 31, 2015Updated 10 years ago
- The Tweets2013 Internet Archive collection☆10Aug 7, 2020Updated 5 years ago
- regex powered yank+substitute☆13Oct 23, 2017Updated 8 years ago
- Latent Dirichlet Allocation on tweets☆15May 17, 2015Updated 10 years ago
- Docker Compose based system for running remote browsers (including Flash and Java support) connected to web archives☆16Jun 10, 2021Updated 4 years ago
- Mining and Utilizing Dataset Relevancy from Oceanographic Datasets to Improve Data Discovery and Access, online demo: https://mudrod.jpl.…☆16Mar 2, 2018Updated 8 years ago
- ArSarcasm-v2 is an extension to the original ArSarcasm dataset. It was used for the shared task on sarcasm detection and sentiment analys…☆12Jan 26, 2022Updated 4 years ago
- Object Resource Stream and CDXJ Drafts☆15Nov 28, 2018Updated 7 years ago
- ☆14Mar 7, 2019Updated 7 years ago
- Migrate your active record migrations to ecto compatible migrations☆13Jun 7, 2015Updated 10 years ago
- Python API for Various DB-Backed Simhash Clusters☆64Mar 16, 2017Updated 9 years ago
- Sources is a web application that allows your team to store, manage and annotate your sources and to make them easily available to your r…☆13Jun 2, 2022Updated 3 years ago
- Code from Bellingcat's guide☆11Dec 8, 2022Updated 3 years ago
- TREC evaluation demonstration/Query Expansion module for Lucene for a lecture on Information Retrieval; About parsing the TREC 10G datase…☆21Nov 17, 2015Updated 10 years ago
- JSON with biographical and political data of Austrian Members of Parliament (Nationalrat/first Chamber) since 1920.☆13Dec 23, 2021Updated 4 years ago
- Ruby gem to get timezone info by known position(latitude, longitude) using google timezone api☆23Jul 28, 2021Updated 4 years ago
- Decoders for weather sensor data from RTL SDR.☆18Apr 27, 2025Updated 10 months ago
- Web-site mirroring tool for archive.org☆24Updated this week
- Google Coral containers☆12Apr 28, 2022Updated 3 years ago
- ☆11May 23, 2019Updated 6 years ago
- A tool for collection archival slivers of the web and web archives☆17Feb 18, 2025Updated last year
- Code for "A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classification" (IJCAI 2018)☆23Jul 14, 2018Updated 7 years ago
- This repository provides German documentation relating to the text recognition software Tesseract. The documentation was created in the c…☆13Sep 6, 2022Updated 3 years ago
- Create and edit WARC and WACZ files☆24Dec 6, 2024Updated last year
- A PyData 2013 talk on straightforward, data-driven ways to handle natural language text in Python.☆51Oct 23, 2014Updated 11 years ago
- source code of bison☆26Jul 20, 2020Updated 5 years ago
- R tools for GDELT and the Global Knowledge Graph☆14Jan 10, 2014Updated 12 years ago
- Seeder - Czech webarchive curating tool and public site☆17Feb 12, 2026Updated last month
- ... just because nltk is too heavy☆35Jul 21, 2010Updated 15 years ago
- Find duplicate text files.☆15Jan 14, 2025Updated last year
- Analyze standard numbers like ARK, DOI, EAN, GTIN, IBAN, ISAN, ISBN, ISMN, ISNI, ISSN, ISTC, ISWC, ORCID, PPN, SICI, UPC, ZDB with Elasti…☆24Jul 5, 2016Updated 9 years ago
- Browser-based annotation tool for Framenet☆16Jan 27, 2015Updated 11 years ago
- Hadoop YARN & MapReduce Memory Calculator☆13Nov 9, 2015Updated 10 years ago
- Workshop on Noisy User-generated Text (W-NUT)☆30Mar 3, 2026Updated 2 weeks ago
- Digs into Dicts (lists and tuples)☆15Jun 23, 2015Updated 10 years ago
- CrisisLex: Your data and lexical resource in crises☆52Jan 25, 2024Updated 2 years ago
- collection of modules to build distributed and reliable concurrent systems in Python.☆206Sep 14, 2013Updated 12 years ago
- Cablemap - WikiLeaks Cablegate parser and Topic Maps converter☆15Dec 1, 2015Updated 10 years ago
- Python wrapper for the FrameNet library.☆24Jul 26, 2011Updated 14 years ago