spotify/spydra

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/spotify/spydra)

spotify / spydra

Ephemeral Hadoop clusters using Google Compute Platform

☆136

Alternatives and similar repositories for spydra

Users that are interested in spydra are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

spotify / styx
View on GitHub
"The path to execution", Styx is a service that schedules batch data processing jobs in Docker containers on Kubernetes.
☆271Jul 12, 2023Updated 3 years ago
spotify / hype
View on GitHub
Runs JVM closures in Docker containers on Kubernetes
☆38Mar 23, 2018Updated 8 years ago
spotify / gcs-tools
View on GitHub
GCS support for avro-tools, parquet-tools and protobuf
☆80Jul 14, 2026Updated last week
spotify / scio
View on GitHub
A Scala API for Apache Beam and Google Cloud Dataflow.
☆2,625Updated this week
spotify / ratatool
View on GitHub
A tool for data sampling, data generation, and data diffing
☆349Updated this week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
spotify / featran
View on GitHub
A Scala feature transformation library for data science and machine learning
☆475Feb 7, 2025Updated last year
spotify / dbeam
View on GitHub
DBeam exports SQL tables into Avro files using JDBC and Apache Beam
☆197Jun 22, 2026Updated last month
spotify / spark-bigquery
View on GitHub
Google BigQuery support for Spark, SQL, and DataFrames
☆156Dec 14, 2019Updated 6 years ago
wepay / kafka-connect-bigquery
View on GitHub
DEPRECATED. PLEASE USE https://github.com/confluentinc/kafka-connect-bigquery. A Kafka Connect BigQuery sink connector
☆151Mar 4, 2024Updated 2 years ago
spotify / gordon-gcp
View on GitHub
GCP Plugin for Gordon: Event-driven Cloud DNS
☆12Apr 5, 2023Updated 3 years ago
zrlio / albis
View on GitHub
Albis: High-Performance File Format for Big Data Systems
☆21Jul 12, 2018Updated 8 years ago
GoogleCloudDataproc / jupyterhub-dataprocspawner
View on GitHub
☆14May 27, 2022Updated 4 years ago
GoogleCloudDataproc / hadoop-connectors
View on GitHub
Libraries and tools for interoperability between Hadoop-related open-source software and Google Cloud Platform.
☆292Updated this week
spotify / zoltar
View on GitHub
Common library for serving TensorFlow, XGBoost and scikit-learn models in production.
☆143Sep 11, 2023Updated 2 years ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
yu-iskw / spark-streaming-with-google-cloud-example
View on GitHub
an example of integrating Spark Streaming with Google Pub/Sub and Google Datastore
☆16Mar 22, 2017Updated 9 years ago
SnootyMonkey / clj-json-ld
View on GitHub
The Clojure library for JSON-LD (JavaScript Object Notation for Linking Data).
☆16Feb 7, 2019Updated 7 years ago
spotify / async-google-pubsub-client
View on GitHub
[SUNSET] Async Google Pubsub Client
☆158Mar 18, 2023Updated 3 years ago
GoogleCloudDataproc / hive-bigquery-storage-handler
View on GitHub
Hive Storage Handler for interoperability between BigQuery and Apache Hive
☆19Jan 29, 2025Updated last year
harelba / hadoop-job-analyzer
View on GitHub
☆29Nov 17, 2014Updated 11 years ago
GoogleCloudDataproc / initialization-actions
View on GitHub
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
☆597Updated this week
spotify / heroic
View on GitHub
The Heroic Time Series Database
☆846Mar 26, 2021Updated 5 years ago
thefactory / marathon-logger
View on GitHub
Event logging service for Mesos Marathon
☆15Jul 3, 2014Updated 12 years ago
amient / affinity
View on GitHub
Library and a Framework for building fast, scalable, fault-tolerant Data APIs based on Akka, Avro, ZooKeeper and Kafka
☆25Oct 16, 2020Updated 5 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
hortonworks-gallery / ambari-iframe-view
View on GitHub
Embed any webapp/website as Ambari view!
☆25Feb 26, 2016Updated 10 years ago
GoogleCloudDataproc / bdutil
View on GitHub
[DEPRECATED] Script used to manage Hadoop and Spark instances on Google Compute Engine
☆108Nov 15, 2019Updated 6 years ago
alexvanboxel / airflow-gcp-examples
View on GitHub
Repository with examples and smoke tests for the GCP Airflow operators and hooks
☆152Jan 15, 2017Updated 9 years ago
GoogleCloudDataproc / cloud-dataproc
View on GitHub
Cloud Dataproc: Samples and Utils
☆205Jul 10, 2026Updated 2 weeks ago
GoogleCloudPlatform / ci-cd-for-data-processing-workflow
View on GitHub
☆47May 3, 2024Updated 2 years ago
aseigneurin / kafka-streams-scala
View on GitHub
Thin Scala wrapper for the Kafka Streams API
☆49Mar 29, 2018Updated 8 years ago
C0urante / avro-random-generator
View on GitHub
Used to generate mock Avro data
☆15Jun 23, 2018Updated 8 years ago
dcos-labs / drax
View on GitHub
DC/OS Resilience Automated Xenodiagnosis tool
☆42Jul 10, 2019Updated 7 years ago
ExpediaGroup / circus-train
View on GitHub
Circus Train is a dataset replication tool that copies Hive tables between clusters and clouds.
☆93Mar 5, 2024Updated 2 years ago
Open source password manager - Proton Pass • Ad
Securely store, share, and autofill your credentials with Proton Pass, the end-to-end encrypted password manager trusted by millions.
spotify / cassandra-reaper
View on GitHub
Software to run automated repairs of cassandra
☆233Jun 25, 2018Updated 8 years ago
lightbend / fdp-sample-applications
View on GitHub
All sample applications for Fast Data Platform
☆15Jul 1, 2019Updated 7 years ago
GoogleCloudPlatform / bigquery-data-importer
View on GitHub
A tool to import large datasets to BigQuery with automatic schema detection.
☆26Jun 25, 2019Updated 7 years ago
dataArtisans / cascading-flink
View on GitHub
Cascading on Apache Flink®
☆54Feb 5, 2024Updated 2 years ago
elodina / syscol
View on GitHub
Collect local Mesos slave, underlying operating system and machine metrics and produce to Apache Kafka
☆20Jan 29, 2016Updated 10 years ago
randerzander / r-service
View on GitHub
Ambari Service definition for deploying R & RHadoop libraries
☆18Aug 3, 2015Updated 10 years ago
eBay / oink
View on GitHub
REST based interface for PIG execution
☆25Dec 13, 2021Updated 4 years ago