Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆13Updated 2 years ago
Alternatives and similar repositories for wbz:
Users that are interested in wbz are comparing it to the libraries listed below
- Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture☆11Updated 2 years ago
- Demo of Hydra☆18Updated 2 years ago
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 3 years ago
- A library to create lore plots (logistic regression of the prevalence of a categorical variable in function of a continuous feature)☆16Updated 2 weeks ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- ☆20Updated 2 years ago
- ☆8Updated 8 months ago
- Cost Efficient Data Pipelines with DuckDB☆49Updated 6 months ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 3 years ago
- Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag☆24Updated 2 years ago
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆56Updated 2 years ago
- A neural network hyper parameter tuner☆30Updated last year
- Challenge Data Engineer☆25Updated 2 years ago
- A repo of Flyte-related conference talks☆14Updated 11 months ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- Operations Research Algorithms☆17Updated 11 months ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆26Updated 2 years ago
- ☆13Updated last year
- ☆19Updated 7 months ago
- ☆18Updated 9 months ago
- Public Repo of my machine learning project to predict home prices☆12Updated 4 years ago
- ☆11Updated 2 years ago
- A Python library for reading and manipulating genetic data.☆22Updated 3 months ago
- csv and flat-file sniffer built in Rust.☆42Updated last year
- ☆11Updated 2 years ago
- ☆21Updated 2 years ago
- ☆20Updated 4 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 8 months ago