MrPowers / spark-specView external linksLinks
Test suite to document the behavior of Spark
☆21Apr 15, 2021Updated 4 years ago
Alternatives and similar repositories for spark-spec
Users that are interested in spark-spec are comparing it to the libraries listed below
Sorting:
- A Python library to generate static data catalog sites. Carte scrapes metadata from your data assets and generates a fully searchable fro…☆29Jul 7, 2022Updated 3 years ago
- Spark data profiling utilities☆22Nov 24, 2018Updated 7 years ago
- An example PySpark project with pytest☆18Oct 13, 2017Updated 8 years ago
- ☆37Aug 29, 2018Updated 7 years ago
- Shed light on your data layout in order to monitor the health of your Lakehouse tables and identify when data maintenance operations shou…☆10Jul 31, 2023Updated 2 years ago
- hmm-filter: Improve classifier predictions for sequential data with Hidden Markov Models (HMMs)☆12Jan 23, 2019Updated 7 years ago
- A comprehensive ELT pipeline for analyzing passenger satisfaction data. Features a modern data architecture with Apache Airflow for extra…☆12Oct 5, 2025Updated 4 months ago
- Final project for COS 521: Using Hokusai algorithm to approximate frequency counts of hashtags in twitter data stream.☆12Jan 13, 2015Updated 11 years ago
- ☆10Nov 27, 2016Updated 9 years ago
- locust-based realbrowser load testing for looker instances☆12Jul 25, 2023Updated 2 years ago
- Fulfills a GitHub workflow_job webhooks into a Pub/Sub queue.☆12Mar 13, 2025Updated 11 months ago
- Angular google web starter kit!☆34Oct 1, 2014Updated 11 years ago
- ☆13Jun 24, 2018Updated 7 years ago
- Open data for mobility in the Greater Oslo area☆10Oct 1, 2019Updated 6 years ago
- ☆13Jan 30, 2026Updated 2 weeks ago
- Mesos on Mesos☆15Mar 11, 2015Updated 10 years ago
- A library for managing groups of lambdas.☆10Oct 30, 2022Updated 3 years ago
- An implementation of the MinHash algorithm in ruby using Murmur Hash☆26May 8, 2009Updated 16 years ago
- Specs2 bindings for Scalaz☆34Dec 28, 2017Updated 8 years ago
- Meta-repository of big data tools -- source and essential plugins for hadoop, pig, wukong, storm, kafka etc.☆29Jun 29, 2014Updated 11 years ago
- A Go library for working with rows, columns, or matrix (deprecated, see https://github.com/shuLhan/share/tree/master/lib/tabula).☆11Nov 28, 2018Updated 7 years ago
- Contains example dags and terraform code to create a composer with a node pool to run pods☆13Oct 15, 2020Updated 5 years ago
- The DAMN (Data Assets Metric Navigation) tool extracts and reports metrics about your data assets☆11Dec 27, 2024Updated last year
- Presto connector to Amazon Kinesis service.☆14Jun 28, 2019Updated 6 years ago
- ☆16Jun 27, 2020Updated 5 years ago
- Animate your Power BI visuals☆18Jan 10, 2023Updated 3 years ago
- Advance concepts for optimizing pandas, dask and numba☆12Sep 8, 2018Updated 7 years ago
- Sketching data structures for scala, including t-digest☆15Sep 7, 2021Updated 4 years ago
- PySpark implementation of the Open Privacy Preserving Record Linkage (OPPRL) specification.☆22Nov 7, 2025Updated 3 months ago
- Lib to communicate with Apple push notification services and make own APNs provider☆49Apr 23, 2010Updated 15 years ago
- Code and setup information for Introduction to Machine Learning with Spark☆12Sep 4, 2015Updated 10 years ago
- Code samples for an Ignite conference presentation on the topic of Automating Azure SQL Data Warehouse☆11Mar 21, 2023Updated 2 years ago
- NodeJS bindings for Google CityHash☆28Apr 18, 2013Updated 12 years ago
- National Pension Scheme (NPS) Fund Tracker with easy-to-use API for latest NAV.☆18Updated this week
- Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive☆186Oct 15, 2025Updated 3 months ago
- Apache Spark testing helpers (dependency free & works with Scalatest, uTest, and MUnit)☆454Updated this week
- A framework for creating composable and pluggable data processing pipelines using Apache Spark, and running them on a cluster.☆47Aug 1, 2016Updated 9 years ago
- Swimlane graphs for Hive, SparkSQL, and Presto based on Ganglia resource graphs☆13Feb 13, 2017Updated 9 years ago
- Template for getting started with Hybrid Dagster Cloud☆14Sep 19, 2025Updated 4 months ago