Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆13Updated 2 years ago
Alternatives and similar repositories for wbz:
Users that are interested in wbz are comparing it to the libraries listed below
- Demo of DuckDB Spark API implements. Same Pyspark code, but DuckDB under the hood☆13Updated last year
- Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture☆11Updated 2 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- ☆11Updated 2 years ago
- Challenge Data Engineer☆25Updated 2 years ago
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 3 years ago
- Demo of Hydra☆18Updated 3 years ago
- Repository for the code assignment of the Deep Learning 1 course, Fall 2021 edition☆10Updated 2 years ago
- The goal of this project is to identify students at risk of dropping out the school☆22Updated 3 years ago
- SDSU Data Science Symposium 2024 - Docker Workshop☆39Updated last year
- ☆17Updated 10 months ago
- Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag☆24Updated 2 years ago
- A collection of my favorite tech-related blog posts.☆9Updated last week
- ☆8Updated 9 months ago
- A Python library for reading and manipulating genetic data.☆22Updated 4 months ago
- This is a capstone project associated with MLOps Zoomcamp. The end goal of the project is to build an end-to-end machine learning projec…☆13Updated 2 years ago
- Exploring the classical regression capabilities of LLMs.☆18Updated 10 months ago
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- ☆10Updated 2 years ago
- PyCon Talks 2022 by Antoine Toubhans☆23Updated 2 years ago
- ☆11Updated 3 years ago
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆13Updated 2 years ago
- ☆10Updated last year
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- DuckDB Extension for reading and writing FASTA and FASTQ Files☆21Updated last year
- ☆18Updated 4 years ago
- Cost Efficient Data Pipelines with DuckDB☆51Updated 8 months ago
- Operations Research Algorithms☆17Updated last year
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆43Updated last year
- Code for data quality with greatexpectations blog☆12Updated 8 months ago