A pure python HDFS client
โ859Apr 19, 2022Updated 3 years ago
Alternatives and similar repositories for snakebite
Users that are interested in snakebite are comparing it to the libraries listed below
Sorting:
- API and command line interface for HDFSโ276Sep 24, 2024Updated last year
- Python interface to Hive and Presto. ๐โ1,695Aug 7, 2024Updated last year
- A native go client for HDFSโ1,409Jan 22, 2025Updated last year
- What happens on the wire when Hadoop RPC call is issued?โ13Jul 1, 2022Updated 3 years ago
- A Python MapReduce and HDFS API for Hadoopโ242Jan 19, 2026Updated last month
- Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visโฆโ18,681Updated this week
- A wrapper for libhdfs3 to interact with HDFS from Pythonโ137Feb 9, 2021Updated 5 years ago
- Python MapReduce library written in Cython. Visit us in #hadoopy on freenode. See the link below for documentation and tutorials.โ241Jan 8, 2016Updated 10 years ago
- Jupyter magics and kernels for working with remote Spark clustersโ1,362Sep 9, 2025Updated 5 months ago
- Livy is an open source REST interface for interacting with Apache Spark from anywhereโ1,007Oct 5, 2022Updated 3 years ago
- Python client for Apache Kafkaโ5,888Updated this week
- Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)โ742Jul 31, 2025Updated 7 months ago
- โ208Apr 28, 2016Updated 9 years ago
- Pure Python wrapper for the Hadoop WebHDFS Rest APIโ52Aug 10, 2020Updated 5 years ago
- Mirror of Apache Toree (Incubating)โ749Feb 21, 2026Updated last week
- Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more.โ6,605Updated this week
- Run MapReduce jobs on Hadoop or Amazon Web Servicesโ2,618Mar 24, 2023Updated 2 years ago
- Dr. Elephant is a job and flow-level performance monitoring and tuning tool for Apache Hadoop and Apache Sparkโ1,371Aug 22, 2023Updated 2 years ago
- REST job server for Apache Sparkโ2,842Jul 8, 2025Updated 7 months ago
- Real-time Query for Hadoop; mirror of Apache Impalaโ34Dec 27, 2022Updated 3 years ago
- Apache Airflow - A platform to programmatically author, schedule, and monitor workflowsโ44,430Updated this week
- TensorFlowOnSpark brings TensorFlow programs to Apache Spark clusters.โ3,858Jul 10, 2023Updated 2 years ago
- [UNMAINTAINED] A developer-friendly Python library to interact with Apache HBaseโ612Updated this week
- A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, orgaโฆโ2,260Feb 19, 2026Updated last week
- C++ native client for Impala and Hive, with Python / pandas bindingsโ72Aug 15, 2018Updated 7 years ago
- Open source SQL Query Assistant service for Databases/Warehousesโ1,467Feb 20, 2026Updated last week
- CMAK is a tool for managing Apache Kafka clustersโ11,952Aug 2, 2023Updated 2 years ago
- Alluxio, data orchestration for analytics and machine learning in the cloudโ7,157Apr 29, 2025Updated 10 months ago
- Parallel computing with task schedulingโ13,746Updated this week
- Apache Druid: a high performance real-time analytics database.โ13,942Updated this week
- Grumpy is a Python to Go source code transcompiler and runtime.โ10,531Jan 18, 2022Updated 4 years ago
- Interactive and Reactive Data Science using Scala and Spark.โ3,152May 16, 2023Updated 2 years ago
- Apache Spark - A unified analytics engine for large-scale data processingโ42,898Updated this week
- Python client for Hadoopยฎ YARN APIโ109Sep 26, 2022Updated 3 years ago
- A translation of the WordCount example from the Hadoop tutorial from Java to Scala.โ32Jul 6, 2010Updated 15 years ago
- GraphFrames is a package for Apache Spark which provides DataFrame-based Graphsโ1,135Feb 6, 2026Updated 3 weeks ago
- A formatter for Python filesโ13,988Feb 20, 2026Updated last week
- Pure python HDFS client: python3.x versionโ26Jan 23, 2026Updated last month
- python implementation of the parquet columnar file format.โ358Oct 26, 2021Updated 4 years ago