python implementation of the parquet columnar file format.
☆358Oct 26, 2021Updated 4 years ago
Alternatives and similar repositories for parquet-python
Users that are interested in parquet-python are comparing it to the libraries listed below
Sorting:
- python implementation of the parquet columnar file format.☆889Jan 6, 2026Updated last month
- Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead☆53Jul 3, 2018Updated 7 years ago
- python implementation of the parquet columnar file format.☆21Dec 18, 2025Updated 2 months ago
- Apache Parquet☆447May 7, 2024Updated last year
- Python interface to Hive and Presto. 🐝☆1,695Aug 7, 2024Updated last year
- Apache Parquet Format☆2,250Updated this week
- Dask and Spark interactions☆21Mar 13, 2017Updated 8 years ago
- Utilities to work with Scala/Java code with py4j☆40Jan 11, 2024Updated 2 years ago
- Data Migration for the Blaze Project☆1,005Jul 15, 2022Updated 3 years ago
- Jupyter magics and kernels for working with remote Spark clusters☆1,362Sep 9, 2025Updated 5 months ago
- Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow☆2,753Dec 8, 2025Updated 2 months ago
- Prescriptive Applications over Kite and Hadoop☆12Oct 14, 2015Updated 10 years ago
- Parallel computing with task scheduling☆13,746Updated this week
- ☆525Jan 1, 2026Updated last month
- Apache Parquet Java☆3,025Updated this week
- the portable Python dataframe library☆6,404Feb 21, 2026Updated last week
- Experimental docker-compose setup to bootstrap distributed on a docker-swarm cluster.☆92Jan 11, 2018Updated 8 years ago
- PyAthenaJDBC is an Amazon Athena JDBC driver wrapper for the Python DB API 2.0 (PEP 249).☆94Sep 20, 2023Updated 2 years ago
- Iceberg is a table format for large, slow-moving tabular data☆490Apr 10, 2023Updated 2 years ago
- ☆16Sep 28, 2017Updated 8 years ago
- Fast Avro for Python☆699Dec 23, 2025Updated 2 months ago
- A schema store service that tracks and manages all the schemas used in the Data Pipeline☆88Mar 2, 2021Updated 4 years ago
- A Python connector for Druid☆517Sep 11, 2025Updated 5 months ago
- Python bindings for the snappy google library☆487Oct 16, 2024Updated last year
- DAFSA-based dictionary-like read-only objects for Python. Based on `dawgdic` C++ library.☆305Jun 11, 2024Updated last year
- Dockerfile for Apache Zeppelin☆17Dec 9, 2015Updated 10 years ago
- [UNMAINTAINED] A developer-friendly Python library to interact with Apache HBase☆612Updated this week
- A Topic Modeling toolbox☆92Apr 26, 2016Updated 9 years ago
- A columnar data container that can be compressed.☆959Oct 27, 2022Updated 3 years ago
- Language defining a data description protocol☆186Feb 14, 2023Updated 3 years ago
- Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with …☆653Feb 4, 2026Updated 3 weeks ago
- Run MapReduce jobs on Hadoop or Amazon Web Services☆2,618Mar 24, 2023Updated 2 years ago
- A Python package to manage extremely large amounts of data☆1,361Updated this week
- Scripts for building Cloudera Manager parcel and CSD for Livy Spark Server☆21Oct 18, 2017Updated 8 years ago
- A distributed task scheduler for Dask☆1,667Feb 21, 2026Updated last week
- Python client for Apache Kafka☆5,888Updated this week
- Multi-user server for Jupyter notebooks☆8,231Updated this week
- A pure python HDFS client☆859Apr 19, 2022Updated 3 years ago
- NumPy and Pandas interface to Big Data☆3,197Sep 29, 2023Updated 2 years ago