jcrist/skein

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jcrist/skein)

jcrist / skein

A tool and library for easily deploying applications on Apache YARN

☆145

Alternatives and similar repositories for skein

Users that are interested in skein are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dask / dask-yarn
View on GitHub
Deploy dask on YARN clusters
☆69Aug 10, 2024Updated last year
jupyterhub / yarnspawner
View on GitHub
Spawn JupyterHub single user notebook servers in Hadoop/YARN containers.
☆19Apr 23, 2025Updated last year
criteo / cluster-pack
View on GitHub
A library on top of either pex or conda-pack to make your Python code easily available on a cluster
☆47Feb 4, 2026Updated 5 months ago
jcrist / hadoop-test-cluster
View on GitHub
Dockerized setup for testing code on realistic hadoop clusters
☆26Jul 20, 2020Updated 6 years ago
tony-framework / TonY
View on GitHub
TonY is a framework to natively run deep learning frameworks on Apache Hadoop.
☆708Oct 14, 2023Updated 2 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
hj2016 / hudi-test
View on GitHub
☆12Sep 25, 2024Updated last year
atl-jugheads / trapper-keeper
View on GitHub
🔥 binders
☆10Mar 4, 2018Updated 8 years ago
allwefantasy / mammuthus-yarn-client
View on GitHub
a project most codes extracting from spark-yarn module make build yarn program more easy
☆13Apr 9, 2016Updated 10 years ago
ExpediaGroup / datasqueeze
View on GitHub
Hadoop utility to compact small files
☆18Feb 16, 2026Updated 5 months ago
criteo / babar
View on GitHub
Profiler for large-scale distributed java applications (Spark, Scalding, MapReduce, Hive,...) on YARN.
☆129Sep 7, 2018Updated 7 years ago
dask / crick
View on GitHub
Streaming and approximate algorithms. WIP, use at own risk.
☆27Sep 4, 2025Updated 10 months ago
lenddoefl / filters
View on GitHub
Validation and data pipelines made easy!
☆12Oct 8, 2019Updated 6 years ago
dask / hdfs3
View on GitHub
A wrapper for libhdfs3 to interact with HDFS from Python
☆137Feb 9, 2021Updated 5 years ago
sinhrks / daskperiment
View on GitHub
Reproducibility for Humans: A lightweight tool to perform reproducible machine learning experiment.
☆24Apr 24, 2019Updated 7 years ago
Managed Database hosting by DigitalOcean • Ad
PostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
mtth / hdfs
View on GitHub
API and command line interface for HDFS
☆276Sep 24, 2024Updated last year
cylondata / cylon
View on GitHub
Cylon is a fast, scalable, distributed memory, parallel runtime with a Pandas like DataFrame.
☆303May 26, 2026Updated last month
takluyver / flonda
View on GitHub
Conda packages from flit information
☆10Dec 10, 2021Updated 4 years ago
acroz / pylivy
View on GitHub
A Python client for Apache Livy, enabling use of remote Apache Spark clusters.
☆71Jan 5, 2022Updated 4 years ago
hortonworks-spark / spark-hive-streaming-sink
View on GitHub
A sink to save Spark Structured Streaming DataFrame into Hive table
☆23May 7, 2018Updated 8 years ago
ExpediaGroup / beekeeper
View on GitHub
Service for automatically managing and cleaning up unreferenced data
☆50Apr 24, 2026Updated 2 months ago
WillianFuks / pyClickModels
View on GitHub
ClickModels for Search Engines Implemented on top of Cython.
☆13Jun 9, 2021Updated 5 years ago
qubole / sparklens
View on GitHub
Qubole Sparklens tool for performance tuning Apache Spark
☆592Jun 26, 2024Updated 2 years ago
jupyterhub / jupyterhub-on-hadoop
View on GitHub
Documentation and resources for deploying JupyterHub on Hadoop
☆19Jul 16, 2019Updated 7 years ago
Wordpress hosting with auto-scaling - Free Trial Offer • Ad
Fully Managed hosting for WordPress and WooCommerce businesses that need reliable, auto-scalable performance. Cloudways SafeUpdates now available.
rjagerman / glint
View on GitHub
Glint: High performance scala parameter server
☆170Jul 20, 2018Updated 8 years ago
hopshadoop / hops
View on GitHub
Hops Hadoop is a distribution of Apache Hadoop with distributed metadata.
☆324Jan 22, 2026Updated 5 months ago
dask / dask-gateway
View on GitHub
A multi-tenant server for securely deploying and managing Dask clusters.
☆146Updated this week
fitnr / unwiki
View on GitHub
Python module to remove wiki markup text.
☆10Jan 15, 2016Updated 10 years ago
neoremind / app-on-yarn-demo
View on GitHub
Demo for service oriented application hosted on Hadoop YARN cluster for HA and scheduling
☆23Apr 2, 2018Updated 8 years ago
denoland / terraform-provider-deno
View on GitHub
Terraform provider for hosted Deno APIs
☆16Mar 18, 2025Updated last year
flokkr / docker-baseimage
View on GitHub
Base hadoop/spark/bigdata image with advanced config loading scripts.
☆11Nov 3, 2020Updated 5 years ago
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,371Mar 20, 2024Updated 2 years ago
linkedin / dynamometer
View on GitHub
A tool for scale and performance testing of HDFS with a specific focus on the NameNode.
☆135Jan 11, 2024Updated 2 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
combust / mleap
View on GitHub
MLeap: Deploy ML Pipelines to Production
☆1,539Jul 10, 2026Updated last week
jcrist / ptime
View on GitHub
IPython magic for parallel profiling (like `%time`, but parallel)
☆72Jul 17, 2017Updated 9 years ago
jcuda / jcuda-matrix-utils
View on GitHub
Utility classes for dense and sparse matrices in JCuda
☆11Mar 8, 2019Updated 7 years ago
PolideaInternal / airflow-breeze-gcp-extension
View on GitHub
☆24Apr 16, 2020Updated 6 years ago
mjstealey / hadoop
View on GitHub
Apache Hadoop - Docker distribution based on CentOS 7 and Oracle Java 8
☆12Feb 20, 2018Updated 8 years ago
lifeomic / sparkflow
View on GitHub
Easy to use library to bring Tensorflow on Apache Spark
☆295Oct 11, 2023Updated 2 years ago
netcomm / miniconf
View on GitHub
Centralized configuration management
☆26Jun 14, 2017Updated 9 years ago