Test suite to document the behavior of Spark
☆21Apr 15, 2021Updated 4 years ago
Alternatives and similar repositories for spark-spec
Users that are interested in spark-spec are comparing it to the libraries listed below
Sorting:
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable fro…☆29Jul 7, 2022Updated 3 years ago
- An example PySpark project with pytest☆18Oct 13, 2017Updated 8 years ago
- Spark data profiling utilities☆23Nov 24, 2018Updated 7 years ago
- Spark functions to run popular phonetic and string matching algorithms☆59Feb 22, 2022Updated 4 years ago
- ☆37Aug 29, 2018Updated 7 years ago
- Slides and code for "Validating Models in R" Strata 2016 RDay http://conferences.oreilly.com/strata/hadoop-big-data-ca/public/schedule/de…☆10Jun 22, 2020Updated 5 years ago
- ☆10Jun 30, 2022Updated 3 years ago
- hmm-filter: Improve classifier predictions for sequential data with Hidden Markov Models (HMMs)☆12Jan 23, 2019Updated 7 years ago
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Jul 31, 2023Updated 2 years ago
- A comprehensive ELT pipeline for analyzing passenger satisfaction data. Features a modern data architecture with Apache Airflow for extra…☆12Oct 5, 2025Updated 5 months ago
- ASAP smoothing☆13Sep 8, 2017Updated 8 years ago
- ☆15Nov 28, 2018Updated 7 years ago
- ☆10Nov 27, 2016Updated 9 years ago
- A command-line tool that summarizes the size of a codebase by language, showing lines of code with and without comments and blank lines.☆47Updated this week
- ☆10Feb 3, 2019Updated 7 years ago
- Mesos on Mesos☆15Mar 11, 2015Updated 10 years ago
- locust-based realbrowser load testing for looker instances☆12Jul 25, 2023Updated 2 years ago
- Fast convolution algorithms with Python types☆10Nov 20, 2016Updated 9 years ago
- ☆13Jun 24, 2018Updated 7 years ago
- Final project for COS 521: Using Hokusai algorithm to approximate frequency counts of hashtags in twitter data stream.☆12Jan 13, 2015Updated 11 years ago
- HackerNews reader☆10Nov 13, 2015Updated 10 years ago
- Small program to run requests against a web server and look for problems☆11Jan 20, 2016Updated 10 years ago
- A library for managing groups of lambdas.☆10Feb 27, 2026Updated last week
- Open data for mobility in the Greater Oslo area☆10Oct 1, 2019Updated 6 years ago
- PROJECT NO LONGER USED, See 'isbnnetinclj'☆35Jan 30, 2012Updated 14 years ago
- Rovers is a service to retrieve repository URLs from multiple repository hosting providers.☆15Jul 2, 2019Updated 6 years ago
- Angular google web starter kit!☆34Oct 1, 2014Updated 11 years ago
- Install and uninstall Go package in isolated path, to keep your `GOPATH/pkg` clean. Like `pipx`, but for Go.☆13Jul 15, 2024Updated last year
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆26May 8, 2009Updated 16 years ago
- Animate your Power BI visuals☆18Jan 10, 2023Updated 3 years ago
- Meta-repository of big data tools -- source and essential plugins for hadoop, pig, wukong, storm, kafka etc.☆29Jun 29, 2014Updated 11 years ago
- PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.☆22Nov 7, 2025Updated 3 months ago
- ☆18Feb 13, 2026Updated 3 weeks ago
- Filling in the Spark function gaps across APIs☆50Apr 14, 2021Updated 4 years ago
- ☆14Feb 28, 2026Updated last week
- Python solver for mixed-effects models☆97Jun 3, 2025Updated 9 months ago
- Manage Jenkins with Grunt.☆45May 17, 2017Updated 8 years ago
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10May 12, 2023Updated 2 years ago
- Specs2 bindings for Scalaz☆34Dec 28, 2017Updated 8 years ago