Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆13Updated 2 years ago
Alternatives and similar repositories for wbz:
Users that are interested in wbz are comparing it to the libraries listed below
- Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture☆11Updated 2 years ago
- Demo of Hydra☆18Updated 2 years ago
- A repo of Flyte-related conference talks☆13Updated 10 months ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- ☆11Updated 2 years ago
- ☆11Updated 2 years ago
- Collection of python scripts to demonstrate asynchronous programming in python☆11Updated 2 years ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 3 years ago
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 2 years ago
- Intro to Polars Tutorial☆21Updated last year
- Challenge Data Engineer☆25Updated 2 years ago
- Scheduling Big Data Workloads and Data Pipelines in the Cloud with pyDag☆24Updated 2 years ago
- A library to create lore plots (logistic regression of the prevalence of a categorical variable in function of a continuous feature)☆16Updated last week
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- ☆19Updated 8 months ago
- Fast and minimal header-only graph-based index for approximate nearest neighbor search (ANNS). https://flatnav.net☆10Updated this week
- Creating a Gradio user interface to predict the sentiment of a tweet☆12Updated 3 years ago
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- Pandas ExtensionDtypes for dealing with genomics data☆47Updated 2 months ago
- SparkBLAST is a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computat…☆9Updated 7 years ago
- The Baseline Site Selection Tool implements simulation tools for clinical trial enrollment.☆18Updated 2 years ago
- Tutorial and examples for using Apache Spark☆16Updated 7 years ago
- SDSU Data Science Symposium 2024 - Docker Workshop☆39Updated 11 months ago
- Awesome Orchest projects, both official and submitted by the community.☆25Updated last year
- ☆20Updated 2 years ago
- ☆13Updated last year
- DuckDB Extension for reading and writing FASTA and FASTQ Files☆20Updated last year
- Public notebooks and datasets to accompany the Data Analysis with Polars course on Udemy☆43Updated last year
- ☆21Updated last year