itspawanbhardwaj/spark-fuzzy-matching

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/itspawanbhardwaj/spark-fuzzy-matching)

itspawanbhardwaj / spark-fuzzy-matching

Fuzzy matching function in spark (https://spark-packages.org/package/itspawanbhardwaj/spark-fuzzy-matching)

☆24

Alternatives and similar repositories for spark-fuzzy-matching

Users that are interested in spark-fuzzy-matching are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

MrPowers / spark-stringmetric
View on GitHub
Spark functions to run popular phonetic and string matching algorithms
☆60Feb 22, 2022Updated 4 years ago
falkenbach / actor-glassdoor-jobs
View on GitHub
Scrapes job data from Glassdoor. Fast and free Glassdoor Scraper to extract all data from job listings including salaries, companies, and…
☆17Dec 20, 2023Updated 2 years ago
facundoolano / advenjure-example
View on GitHub
Example game for the advenjure engine
☆10Oct 15, 2017Updated 8 years ago
wavepot / live
View on GitHub
live performance focused version of wavepot
☆14May 8, 2015Updated 11 years ago
microsoft / AzureML-R-template
View on GitHub
Patterns and examples for running R code with Azure Machine Learning
☆22Sep 29, 2022Updated 3 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
ga4gh / pedigree
View on GitHub
Repository for the family history/pedigree project
☆13Jun 17, 2026Updated last month
ChrisMaherLab / PACT
View on GitHub
☆10Sep 14, 2023Updated 2 years ago
Azure / spark-cdm
View on GitHub
A Spark connector for the Azure Common Data Model
☆15May 31, 2023Updated 3 years ago
common-workflow-lab / cwl-ex
View on GitHub
CWL experimental grammar
☆11Nov 16, 2025Updated 8 months ago
mskcc / ACCESS-Pipeline
View on GitHub
cfDNA Sequencing Pipeline with UMI
☆11Jun 11, 2026Updated last month
WillianFuks / example_dataproc_twitter
View on GitHub
This repository is used as source code for the medium post about implementing a Twitter recommender system using GCP.
☆31Dec 12, 2017Updated 8 years ago
ing-bank / spark-matcher
View on GitHub
Record matching and entity resolution at scale in Spark
☆36Oct 31, 2023Updated 2 years ago
PolusAI / sophios
View on GitHub
A domain specific language for creating scientific pipelines
☆15Jul 1, 2026Updated 2 weeks ago
amineHY / docker-streamlit-app
View on GitHub
Run streamlit web application, test and deploy to a cloud service (GCP, AWS, Heroku)
☆14Oct 8, 2022Updated 3 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
harisankarsadasivan / RawMap
View on GitHub
Complements Minimap2 for a fast and efficient Read-Until pipeline
☆14Mar 21, 2023Updated 3 years ago
kircherlab / MPRAsnakeflow
View on GitHub
With this snakemake pipeline you can process your MPRA sequencing data (assignment and count). It is the standard MPRA pipeline of the IG…
☆16Jul 7, 2026Updated 2 weeks ago
r-spark / sparkhail
View on GitHub
A sparklyr extension for Hail
☆15Jul 8, 2021Updated 5 years ago
karlkumbier / iRF
View on GitHub
☆14Nov 27, 2025Updated 7 months ago
shauli-ravfogel / conformal-prediction
View on GitHub
☆10Feb 2, 2023Updated 3 years ago
stefan-jansen / topic-modeling
View on GitHub
☆14Feb 10, 2023Updated 3 years ago
microsoft / Azure-Synapse-Customer-Insights-Customer360-Solution-Accelerator
View on GitHub
Solution accelerator to help developers build an end-to-end Customer 360 solution using Azure Synapse Analytics and Dynamics 360 Customer…
☆32Feb 8, 2023Updated 3 years ago
dedupeio / doublemetaphone
View on GitHub
Python wrapper for a C++ Double Metaphone
☆15Jan 12, 2026Updated 6 months ago
paraluke23 / automl-handson
View on GitHub
Notebook for workshop
☆20Aug 6, 2019Updated 6 years ago
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
Gaglia88 / sparker
View on GitHub
SparkER: an Entity Resolution framework for Apache Spark
☆67Mar 29, 2024Updated 2 years ago
frictionlessdata / tableschema-r
View on GitHub
An R library for working with Table Schema.
☆27Apr 10, 2025Updated last year
theislab / cellrank_reproducibility
View on GitHub
CellRank's reproducibility repository.
☆17Jan 13, 2022Updated 4 years ago
rkitchen / exceRpt
View on GitHub
Software for preprocessing, filtering, alignment, and reporting of smallRNA-seq datasets
☆20Jan 13, 2021Updated 5 years ago
edawson / mkmh
View on GitHub
Generate kmers/minimizers/hashes/MinHash signatures, including with multiple kmer sizes.
☆24Jan 9, 2021Updated 5 years ago
rworkflow / Rcwl
View on GitHub
Write CWL in R
☆15May 9, 2024Updated 2 years ago
DiseaseNeuroGenomics / snMultiome
View on GitHub
Analysis scripts for snMultiome project
☆16Dec 12, 2024Updated last year
scala / sbt-scala-module
View on GitHub
sbt plugin for scala modules.
☆13Updated this week
TheJacksonLaboratory / CloudNeo
View on GitHub
CWL implementation of CloudNeo: A cloud pipeline for identifying patient-specific tumor neoantigens
☆19Apr 19, 2019Updated 7 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
Azure / data-product-streaming
View on GitHub
Template to deploy a Data Product for data stream processing into a Data Landing Zone of the Data Management & Analytics Scenario (former…
☆36Jul 17, 2023Updated 3 years ago
RyZenoKelb / Terminux
View on GitHub
🖥️ Modern web-based terminal simulator with authentic Ubuntu interface. Features complete file system, built-in text editor, file downlo…
☆15May 28, 2025Updated last year
devlace / azure-databricks-anomaly
View on GitHub
Anomaly Detection Pipeline on Azure Databricks
☆28Jul 29, 2019Updated 6 years ago
GrowinScala / Flipper
View on GitHub
PDF to JSON, JSON to PDF and etc.
☆12Apr 18, 2018Updated 8 years ago
GMOD / vcf-js
View on GitHub
VCF (variant call format) parser
☆26Jun 26, 2026Updated 3 weeks ago
sbg / sevenbridges-cwl
View on GitHub
Seven Bridges Python library for programatic generation of CWL workflows.
☆21Apr 13, 2026Updated 3 months ago
GaryMcD / rustacean_gpt
View on GitHub
Meet Rustacean GPT, an experimental project transforming OpenAi's GPT into a helpful, autonomous software engineer to support senior deve…
☆14May 10, 2023Updated 3 years ago