Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆13Updated 2 years ago
Alternatives and similar repositories for wbz:
Users that are interested in wbz are comparing it to the libraries listed below
- ☆11Updated 2 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 3 years ago
- A repo of Flyte-related conference talks☆14Updated last year
- Python implementation of Age-Partitioned Bloom Filter with S3 periodic backup support.☆11Updated 2 months ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 3 years ago
- Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture☆11Updated 2 years ago
- Achieving true parallelism in Python 3.12☆10Updated 11 months ago
- SDSU Data Science Symposium 2024 - Docker Workshop☆39Updated last year
- This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning projec…☆13Updated 2 years ago
- ☆17Updated 4 years ago
- Pandas ExtensionDtypes for dealing with genomics data☆47Updated 4 months ago
- ☆20Updated 2 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 10 months ago
- Use Multiple Linear Regression, Python, Pandas, and Matplotlib to analyze the lifetime value and the key factors of the ‘Telco Customer C…☆10Updated 4 years ago
- ☆13Updated 2 years ago
- ☆21Updated last year
- Demo repository to lambda-fy your dbt runs☆11Updated last year
- PyCon Talks 2022 by Antoine Toubhans☆23Updated 2 years ago
- Sample project that use Dagster, dbt, DuckDB and Dash to visualize car and motorcycle Spanish market☆57Updated 2 years ago
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated last year
- Demo of Hydra☆18Updated 2 years ago
- Official code for paper: Conservative objective models are a special kind of contrastive divergence-based energy model☆14Updated last year
- ☆8Updated 9 months ago
- Workshop about DVC VSCode Extension☆14Updated 6 months ago
- Collection of python scripts to demonstrate asynchronous programming in python☆11Updated 2 years ago
- Predict the number of deaths due to covid19 in the next two weeks☆11Updated 2 years ago
- ☆17Updated 10 months ago
- Python library for interacting with Dask clusters in Saturn☆12Updated 5 months ago
- Create a local dashboard to visualize and filter your GitHub feed☆29Updated 2 years ago