jcrobak/parquet-python

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/jcrobak/parquet-python)

jcrobak / parquet-python

python implementation of the parquet columnar file format.

☆362

Alternatives and similar repositories for parquet-python

Users that are interested in parquet-python are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

dask / fastparquet
View on GitHub
python implementation of the parquet columnar file format.
☆900Jun 29, 2026Updated 3 weeks ago
martindurant / fastparquet
View on GitHub
python implementation of the parquet columnar file format.
☆21Jun 29, 2026Updated 3 weeks ago
dask / knit
View on GitHub
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
☆54Jul 3, 2018Updated 8 years ago
apache / parquet-cpp
View on GitHub
Apache Parquet
☆448May 7, 2024Updated 2 years ago
apache / parquet-format
View on GitHub
Apache Parquet Format
☆2,498Jul 14, 2026Updated last week
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
altaurog / cpgcopy
View on GitHub
Fast PostgreSQL bulk inserts with Cython and binary copy
☆12Jun 1, 2020Updated 6 years ago
dropbox / PyHive
View on GitHub
Python interface to Hive and Presto. 🐝
☆1,697Apr 13, 2026Updated 3 months ago
adtech-labs / spylon
View on GitHub
Utilities to work with Scala/Java code with py4j
☆40Jan 11, 2024Updated 2 years ago
blaze / odo
View on GitHub
Data Migration for the Blaze Project
☆1,006Jul 15, 2022Updated 4 years ago
Automattic / cm-livy-scripts
View on GitHub
Scripts for building Cloudera Manager parcel and CSD for Livy Spark Server
☆21Oct 18, 2017Updated 8 years ago
jupyter-incubator / sparkmagic
View on GitHub
Jupyter magics and kernels for working with remote Spark clusters
☆1,364Sep 9, 2025Updated 10 months ago
databricks / spark-redshift
View on GitHub
Redshift data source for Apache Spark
☆608Aug 10, 2023Updated 2 years ago
wesm / feather
View on GitHub
Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
☆2,758Dec 8, 2025Updated 7 months ago
mrocklin / dask-spark
View on GitHub
Dask and Spark interactions
☆21Mar 13, 2017Updated 9 years ago
GPUs on demand by Runpod - Special Offer Available • Ad
Run AI, ML, and HPC workloads on powerful cloud GPUs—without limits or wasted spend. Deploy GPUs in under a minute and pay by the second.
pyathena-dev / PyAthenaJDBC
View on GitHub
PyAthenaJDBC is an Amazon Athena JDBC driver wrapper for the Python DB API 2.0 (PEP 249).
☆94Sep 20, 2023Updated 2 years ago
dask / dask
View on GitHub
Parallel computing with task scheduling
☆13,864Jul 14, 2026Updated last week
ibis-project / ibis
View on GitHub
the portable Python dataframe library
☆6,601Updated this week
apache / parquet-java
View on GitHub
Apache Parquet Java
☆3,069Updated this week
rbrush / kite-apps
View on GitHub
Prescriptive Applications over Kite and Hadoop
☆12Oct 14, 2015Updated 10 years ago
aocenas / spark-docker-swarm
View on GitHub
Spark on Docker Swarm example code
☆11Nov 27, 2016Updated 9 years ago
Netflix / iceberg
View on GitHub
Iceberg is a table format for large, slow-moving tabular data
☆494Apr 10, 2023Updated 3 years ago
minrk / findspark
View on GitHub
☆525Mar 1, 2026Updated 4 months ago
ogrisel / docker-distributed
View on GitHub
Experimental docker-compose setup to bootstrap distributed on a docker-swarm cluster.
☆92Jan 11, 2018Updated 8 years ago
Deploy to Railway using AI coding agents - Free Credits Offer • Ad
Use Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
YelpArchive / schematizer
View on GitHub
A schema store service that tracks and manages all the schemas used in the Data Pipeline
☆88Mar 2, 2021Updated 5 years ago
mumoshu / kube-airflow
View on GitHub
A docker image and kubernetes config files to run Airflow on Kubernetes
☆655Jul 19, 2019Updated 7 years ago
Shinichi-Nakagawa / airflow-docker
View on GitHub
Apache Airflow Docker Image.
☆16May 3, 2018Updated 8 years ago
pytries / DAWG
View on GitHub
DAFSA-based dictionary-like read-only objects for Python. Based on `dawgdic` C++ library.
☆308Jun 11, 2024Updated 2 years ago
databricks / spark-avro
View on GitHub
Avro Data Source for Apache Spark
☆537Dec 19, 2018Updated 7 years ago
dask / pandas-streaming
View on GitHub
☆16Sep 28, 2017Updated 8 years ago
apache / arrow
View on GitHub
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆16,942Updated this week
fastavro / fastavro
View on GitHub
Fast Avro for Python
☆711Updated this week
harelba / hadoop-job-analyzer
View on GitHub
☆29Nov 17, 2014Updated 11 years ago
Deploy open-source AI quickly and easily - Special Bonus Offer • Ad
Runpod Hub is built for open source. One-click deployment and autoscaling endpoints without provisioning your own infrastructure.
intake / python-snappy
View on GitHub
Python bindings for the snappy google library
☆490Oct 16, 2024Updated last year
druid-io / pydruid
View on GitHub
A Python connector for Druid
☆520May 4, 2026Updated 2 months ago
bufferapp / lookerpy
View on GitHub
A Python API client for Looker
☆14Aug 2, 2018Updated 7 years ago
gavincyi / TickTickBacktest
View on GitHub
Backtesting tool on tick data
☆11Jan 30, 2017Updated 9 years ago
Yelp / mrjob
View on GitHub
Run MapReduce jobs on Hadoop or Amazon Web Services
☆2,610Apr 2, 2026Updated 3 months ago
exoscale / python-riemann-wrapper
View on GitHub
time and report exception in riemann for functions
☆20Apr 17, 2016Updated 10 years ago
stripe-archive / herringbone
View on GitHub
Tools for working with parquet, impala, and hive
☆135Jan 4, 2021Updated 5 years ago