dask/fastparquet

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/dask/fastparquet)

dask / fastparquet

python implementation of the parquet columnar file format.

☆900

Alternatives and similar repositories for fastparquet

Users that are interested in fastparquet are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

jcrobak / parquet-python
View on GitHub
python implementation of the parquet columnar file format.
☆362Oct 26, 2021Updated 4 years ago
dask / dask
View on GitHub
Parallel computing with task scheduling
☆13,871Updated this week
martindurant / fastparquet
View on GitHub
python implementation of the parquet columnar file format.
☆21Jun 29, 2026Updated 3 weeks ago
blue-yonder / turbodbc
View on GitHub
Turbodbc is a Python module to access relational databases via the Open Database Connectivity (ODBC) interface. The module complies with …
☆658Updated this week
wesm / feather
View on GitHub
Feather: fast, interoperable binary data frame storage for Python, R, and more powered by Apache Arrow
☆2,757Dec 8, 2025Updated 7 months ago
End-to-end encrypted cloud storage - Proton Drive • Ad
Special offer: 40% Off Yearly / 80% Off First Month. Protect your most important files, photos, and documents from prying eyes.
dask / distributed
View on GitHub
A distributed task scheduler for Dask
☆1,675Updated this week
fsspec / s3fs
View on GitHub
S3 Filesystem
☆1,041Jun 29, 2026Updated 3 weeks ago
ibis-project / ibis
View on GitHub
the portable Python dataframe library
☆6,610Updated this week
dask / hdfs3
View on GitHub
A wrapper for libhdfs3 to interact with HDFS from Python
☆137Feb 9, 2021Updated 5 years ago
apache / arrow
View on GitHub
Apache Arrow is the universal columnar format and multi-language toolbox for fast data interchange and in-memory analytics
☆16,956Updated this week
d6t / d6tflow
View on GitHub
Python library for building highly effective data science workflows
☆947Jun 28, 2026Updated 3 weeks ago
apache / parquet-cpp
View on GitHub
Apache Parquet
☆448May 7, 2024Updated 2 years ago
vaexio / vaex
View on GitHub
Out-of-Core hybrid Apache Arrow/NumPy DataFrame for Python, ML, visualization and exploration of big tabular data at a billion rows per s…
☆8,511Apr 1, 2026Updated 3 months ago
dask / dask-ml
View on GitHub
Scalable Machine Learning with Dask
☆950Sep 27, 2025Updated 9 months ago
Virtual machines for every use case on DigitalOcean • Ad
Get dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
dask / dask-xgboost
View on GitHub
☆161Jul 14, 2021Updated 5 years ago
apache / parquet-format
View on GitHub
Apache Parquet Format
☆2,506Updated this week
dask / partd
View on GitHub
Concurrent appendable key-value storage
☆108Jul 15, 2024Updated 2 years ago
blaze / odo
View on GitHub
Data Migration for the Blaze Project
☆1,006Jul 15, 2022Updated 4 years ago
dask / dask-kubernetes
View on GitHub
Native Kubernetes integration for Dask
☆324Jul 3, 2026Updated 3 weeks ago
piskvorky / smart_open
View on GitHub
Utils for streaming large files (S3, HDFS, gzip, bz2...)
☆3,454Jul 15, 2026Updated last week
fastavro / fastavro
View on GitHub
Fast Avro for Python
☆711Updated this week
databricks / koalas
View on GitHub
Koalas: pandas API on Apache Spark
☆3,372Mar 20, 2024Updated 2 years ago
dask / dask-tensorflow
View on GitHub
☆93Jan 8, 2020Updated 6 years ago
Bare Metal GPUs on DigitalOcean Gradient AI • Ad
Purpose-built for serious AI teams training foundational models, running large-scale inference, and pushing the boundaries of what's possible.
modin-project / modin
View on GitHub
Modin: Scale your Pandas workflows by changing a single line of code
☆10,393Feb 10, 2026Updated 5 months ago
quantopian / qgrid
View on GitHub
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks
☆3,088Jan 12, 2024Updated 2 years ago
dask / knit
View on GitHub
Deprecated, please use https://github.com/jcrist/skein or https://github.com/dask/dask-yarn instead
☆54Jul 3, 2018Updated 8 years ago
dask / zict
View on GitHub
Useful Mutable Mappings
☆72Oct 31, 2023Updated 2 years ago
dask / dask-searchcv
View on GitHub
dask-searchcv is now part of dask-ml: https://github.com/dask/dask-ml
☆239Oct 13, 2018Updated 7 years ago
Blosc / bcolz
View on GitHub
A columnar data container that can be compressed.
☆958Oct 27, 2022Updated 3 years ago
nteract / papermill
View on GitHub
📚 Parameterize, execute, and analyze notebooks
☆6,462Jul 6, 2026Updated 2 weeks ago
joblib / joblib
View on GitHub
Computing with Python functions.
☆4,379Updated this week
python-streamz / streamz
View on GitHub
Real-time stream processing for python
☆1,302Apr 7, 2026Updated 3 months ago
Deploy on Railway without the complexity - Free Credits Offer • Ad
Connect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
rapidsai / cudf
View on GitHub
cuDF - GPU DataFrame Library
☆9,715Updated this week
xhochy / fletcher
View on GitHub
Pandas ExtensionDType/Array backed by Apache Arrow
☆232Feb 22, 2023Updated 3 years ago
dask / dask-labextension
View on GitHub
JupyterLab extension for Dask
☆328Jun 2, 2025Updated last year
holoviz / hvplot
View on GitHub
A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
☆1,355Jul 13, 2026Updated last week
holoviz / datashader
View on GitHub
Quickly and accurately render even the largest data.
☆3,559Updated this week
uber / petastorm
View on GitHub
Petastorm library enables single machine or distributed training and evaluation of deep learning models from datasets in Apache Parquet f…
☆1,888Jan 2, 2026Updated 6 months ago
pydata / patsy
View on GitHub
Describing statistical models in Python using symbolic formulas
☆988Updated this week