spotify/snakebite

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/spotify/snakebite)

spotify / snakebite

A pure python HDFS client

☆857

Alternatives and similar repositories for snakebite

Users that are interested in snakebite are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

mtth / hdfs
View on GitHub
API and command line interface for HDFS
☆276Sep 24, 2024Updated last year
dropbox / PyHive
View on GitHub
Python interface to Hive and Presto. 🐝
☆1,696Apr 13, 2026Updated 3 months ago
crs4 / pydoop
View on GitHub
A Python MapReduce and HDFS API for Hadoop
☆241Jan 19, 2026Updated 6 months ago
colinmarc / hdfs
View on GitHub
A native go client for HDFS
☆1,404Jan 22, 2025Updated last year
elazarl / hadoop_rpc_walktrhough
View on GitHub
What happens on the wire when Hadoop RPC call is issued?
☆13Jul 1, 2022Updated 4 years ago
Serverless GPU API endpoints on Runpod - Get Bonus Credits • Ad
Skip the infrastructure headaches. Auto-scaling, pay-as-you-go, no-ops approach lets you focus on innovating your application.
spotify / luigi
View on GitHub
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, vis…
☆18,746Updated this week
dask / hdfs3
View on GitHub
A wrapper for libhdfs3 to interact with HDFS from Python
☆137Feb 9, 2021Updated 5 years ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
pywebhdfs / pywebhdfs
View on GitHub
Pure Python wrapper for the Hadoop WebHDFS Rest API
☆52Aug 10, 2020Updated 5 years ago
cloudera / impyla
View on GitHub
Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
☆742Jun 19, 2026Updated last month
Yelp / mrjob
View on GitHub
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,610Apr 2, 2026Updated 3 months ago
cloudera / livy
View on GitHub
Livy is an open source REST interface for interacting with Apache Spark from anywhere
☆1,008Oct 5, 2022Updated 3 years ago
dpkp / kafka-python
View on GitHub
Python client for Apache Kafka
☆5,900Updated this week
cloudera / hs2client
View on GitHub
C++ native client for Impala and Hive, with Python / pandas bindings
☆72Aug 15, 2018Updated 7 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
linkedin / dr-elephant
View on GitHub
Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Spark
☆1,370Aug 22, 2023Updated 2 years ago
BradRuderman / pyhs2
View on GitHub
☆207Apr 28, 2016Updated 10 years ago
apache / zeppelin
View on GitHub
Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.
☆6,645Updated this week
internetarchive / snakebite-py3
View on GitHub
Pure python HDFS client: python3.x version
☆26Jan 23, 2026Updated 5 months ago
cloudera / Impala
View on GitHub
Real-time Query for Hadoop; mirror of Apache Impala
☆34Dec 27, 2022Updated 3 years ago
python-happybase / happybase
View on GitHub
[UNMAINTAINED] A developer-friendly Python library to interact with Apache HBase
☆608Mar 16, 2026Updated 4 months ago
apache / gobblin
View on GitHub
A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orga…
☆2,270Jun 24, 2026Updated 3 weeks ago
apache / incubator-toree
View on GitHub
Mirror of Apache Toree (Incubating)
☆751Updated this week
apache / airflow
View on GitHub
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
☆46,196Updated this week
Managed Kubernetes at scale on DigitalOcean • Ad
DigitalOcean Kubernetes includes the control plane, bandwidth allowance, container registry, automatic updates, and more for free.
spark-jobserver / spark-jobserver
View on GitHub
REST job server for Apache Spark
☆2,837Mar 3, 2026Updated 4 months ago
gateway-experiments / hadoop-yarn-api-python-client
View on GitHub
Python client for Hadoop® YARN API
☆109Sep 26, 2022Updated 3 years ago
yahoo / CMAK
View on GitHub
CMAK is a tool for managing Apache Kafka clusters
☆11,927Aug 2, 2023Updated 2 years ago
datadudes / salesforce2hadoop
View on GitHub
Import Salesforce data into Hadoop HDFS in Avro format
☆23Jan 8, 2020Updated 6 years ago
apache / spark
View on GitHub
Apache Spark - A unified analytics engine for large-scale data processing
☆43,666Updated this week
haohui / libhdfspp
View on GitHub
libhdfs++ is a modern implementation of HDFS client in C++11 that is designed for the Massive Parallel Processing (MPP) applications.
☆28Jul 6, 2015Updated 11 years ago
dstreev / hdfs-cli
View on GitHub
Traverse HDFS without jvm startup delays and directory context!! Supports multiple HDFS hosts, command line history and tab completion.
☆17May 20, 2016Updated 10 years ago
yahoo / TensorFlowOnSpark
View on GitHub
TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.
☆3,846Jul 10, 2023Updated 3 years ago
apache / druid
View on GitHub
Apache Druid: a high performance real-time analytics database.
☆14,034Updated this week
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
cloudera / hue
View on GitHub
Open source SQL Query Assistant service for Databases/Warehouses
☆1,413Updated this week
paulmw / hive-udf
View on GitHub
☆16Apr 17, 2014Updated 12 years ago
bwhite / hadoopy
View on GitHub
Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.
☆243Jan 8, 2016Updated 10 years ago
dask / dask
View on GitHub
Parallel computing with task scheduling
☆13,865Updated this week
databricks / tensorframes
View on GitHub
[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
☆744Jul 30, 2024Updated last year
twitter / scalding
View on GitHub
A Scala API for Cascading
☆3,522May 28, 2023Updated 3 years ago
databricks / spark-sklearn
View on GitHub
(Deprecated) Scikit-learn integration package for Apache Spark
☆1,071Dec 3, 2019Updated 6 years ago