mozilla / python_mozetlLinks
ETL jobs for Firefox Telemetry
☆28Updated 3 months ago
Alternatives and similar repositories for python_mozetl
Users that are interested in python_mozetl are comparing it to the libraries listed below
Sorting:
- Airflow configuration for Telemetry☆193Updated last week
- A toolkit providing a uniform interface for connecting to and extracting data from a wide variety of (potentially remote) data stores (in…☆254Updated last month
- Airflow workflow management platform chef cookbook.☆71Updated 6 years ago
- Telemetry Analysis Service☆37Updated 5 years ago
- Snowplow event tracker for Python. Add analytics to your Python and Django apps, webapps and games☆45Updated 3 months ago
- Data analysis and reporting tool for quick access to custom charts and tables in Jupyter Notebooks and in the shell.☆122Updated last year
- Airflow declarative DAGs via YAML☆133Updated last year
- Documentation and implementation of telemetry ingestion on Google Cloud Platform☆83Updated this week
- REST-like API exposing Airflow data and operations☆61Updated 6 years ago
- locopy: Loading/Unloading to Redshift and Snowflake using Python.☆110Updated 3 weeks ago
- ☆75Updated 5 months ago
- An example PySpark project with pytest☆16Updated 7 years ago
- Data processing and modelling framework for automating tasks (incl. Python & SQL transformations).☆120Updated 2 months ago
- Airflow plugin to transfer arbitrary files between operators☆78Updated 6 years ago
- CLI tool to launch Spark jobs on AWS EMR☆67Updated last year
- Helpers & syntactic sugar for PySpark.☆62Updated 2 years ago
- A Scalable Data Cleaning Library for PySpark.☆29Updated 6 years ago
- This repository holds some python libraries and plugins designed to be used with MemSQL.☆62Updated 2 years ago
- A plugin for Apache Airflow that allows you to manage the users that can login☆14Updated 5 years ago
- Thin-client metrics library for use with Atlas and SpectatorD☆48Updated last month
- Serializes data into a JSON format using AVRO schema.☆137Updated 3 years ago
- Collection of dockerized ETL jobs managed by data engineering.☆20Updated this week
- Ansible role to deploy and configure Airflow☆41Updated 3 weeks ago
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆52Updated 7 years ago
- A toolset to streamline running spark python on EMR☆20Updated 8 years ago
- Metadata service library for Amundsen☆83Updated last month
- A python client library for the Stitch Import API☆42Updated last year
- Data Catalog for Databases and Data Warehouses☆35Updated last year
- Functional testing framework for Big Data pipelines.☆57Updated 2 years ago
- PySpark phonetic and string matching algorithms☆39Updated last year