Java implementation for MinHash and LSH for finding near duplicate documents as measured by Jaccard similarity.
☆33Mar 30, 2015Updated 11 years ago
Alternatives and similar repositories for MinHashLSH
Users that are interested in MinHashLSH are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- A Java implementation of Locality Sensitive Hashing (LSH)☆300Nov 19, 2022Updated 3 years ago
- This provides tools for b-bit MinHash algorism.☆39Nov 21, 2025Updated 6 months ago
- A simple implementation of simhash algorithm by java.☆155Oct 10, 2020Updated 5 years ago
- Natural Language Processing algorithm including TextClassification, sentiment analysis, TextRank, LDA and so on☆12Mar 23, 2017Updated 9 years ago
- ☆12Sep 14, 2021Updated 4 years ago
- 1-Click AI Models by DigitalOcean Gradient • AdDeploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
- ☆12Jun 17, 2019Updated 6 years ago
- ☆11May 16, 2022Updated 4 years ago
- ☆11May 25, 2023Updated 3 years ago
- a list of links to help you make various important architectural decisions☆11Jul 13, 2016Updated 9 years ago
- The official implementation of EMNLP 2021 paper "#HowYouTagTweets: Learning User Hashtagging Preferences via Personalized Topic Attention…☆11Feb 21, 2023Updated 3 years ago
- 计算TFIDF的三种方法:Python、sklearn、gensim☆11Feb 26, 2019Updated 7 years ago
- Bitwise analysis tools☆16Feb 5, 2019Updated 7 years ago
- ☆16Apr 11, 2025Updated last year
- An old and super slow python implementation of HMM trigram POS tagger.☆17Mar 23, 2014Updated 12 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- Java implementation of famous fuzzy wuzzy algorithm -- http://seatgeek.com/blog/dev/fuzzywuzzy-fuzzy-string-matching-in-python☆15Jul 13, 2016Updated 9 years ago
- 自然语言处理之CFG句法分析☆10Mar 27, 2018Updated 8 years ago
- using FM latent vectors as embedding features☆14Sep 7, 2017Updated 8 years ago
- A custom AWS credential provider that allows your Hadoop or Spark application access S3 file system by assuming a role☆10Jan 9, 2026Updated 4 months ago
- ☆10Apr 15, 2023Updated 3 years ago
- Cross-Care☆11Jun 24, 2024Updated last year
- A Locality-Sensitive Hashing Library for Scala with optional Redis storage.☆16Jan 5, 2022Updated 4 years ago
- Migrate repositories from GitLab to GitHub☆22Jan 8, 2019Updated 7 years ago
- 使用ALBERT预训练模型,用于识别文本中的时间,同时验证模型的预测耗时是否有显著提升。☆57Dec 16, 2019Updated 6 years ago
- Wordpress hosting with auto-scaling - Free Trial Offer • AdFully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
- ☆16Nov 17, 2023Updated 2 years ago
- ☆11Apr 13, 2026Updated last month
- 机器学习相关作业☆25Jan 11, 2023Updated 3 years ago
- Examples of spark-lucenerdd☆15Oct 6, 2023Updated 2 years ago
- Java library for building clients for XHTML-based hypermedia APIs☆35Nov 7, 2011Updated 14 years ago
- I'm 99% sure that you already heard about APIs or REST APIs, it's what Twitter, flickr and a lot more companies use to share they're reso…☆29Apr 9, 2011Updated 15 years ago
- Python实现的Scheme方言,支持宏、continuation、lambda、各种基本类型等等,可以直接Python解释执行,也可以编译到JavaScript。编译到JS可以与JavaScript动态交互(互相调用)☆21Jun 24, 2013Updated 12 years ago
- Accessing the Facebook Marketing API using httr in R, for demographic researchers☆21Nov 8, 2017Updated 8 years ago
- Concise tutorials for distributed training using PyTorch☆10Apr 18, 2023Updated 3 years ago
- End-to-end encrypted email - Proton Mail • AdSpecial offer: 40% Off Yearly / 80% Off First Month. All Proton services are open source and independently audited for security.
- ☆10Apr 16, 2022Updated 4 years ago
- CliBench: Multifaceted Evaluation of Large Language Models in Clinical Decisions on Diagnoses, Procedures, Lab Tests Orders and Prescript…☆19Aug 9, 2024Updated last year
- Using NLP techniques to summarize prompts for program synthesis☆17Sep 26, 2023Updated 2 years ago
- DSResSol: A sequence-based solubility predictor created with Dilated Squeeze Excitation Residual Networks☆12May 30, 2024Updated last year
- my own R course☆11Oct 14, 2014Updated 11 years ago
- PVDM, PVDBOW, doc2vec, sentence2vec, "Distributed Representations of Sentences and Documents ICML'14".☆21May 9, 2018Updated 8 years ago
- code for the paper "Personalized Context-Aware Re-ranking for E-commerce Recommendation Systems"☆52Jan 23, 2019Updated 7 years ago