Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆13Updated 2 years ago
Alternatives and similar repositories for wbz
Users that are interested in wbz are comparing it to the libraries listed below
Sorting:
- Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture☆11Updated 2 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- Engaging visualisations, made easy.☆14Updated 9 months ago
- Demo of DuckDB Spark API implements. Same Pyspark code, but DuckDB under the hood☆14Updated last year
- Operations Research Algorithms☆17Updated last year
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 3 years ago
- A repo of Flyte-related conference talks☆14Updated last year
- Demo of Hydra☆18Updated 3 years ago
- ☆11Updated 2 years ago
- ☆20Updated 2 years ago
- Demo repository to lambda-fy your dbt runs☆11Updated last year
- Small python library solely for quick Quarto extensions☆21Updated last year
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆20Updated 3 years ago
- Code to help generate SQL for stakeholders. Code at https://www.startdataengineering.com/post/data-democratize-llm/☆11Updated 11 months ago
- Pandas ExtensionDtypes for dealing with genomics data☆47Updated 6 months ago
- Python implementation of Age-Partitioned Bloom Filter with S3 periodic backup support.☆11Updated 3 months ago
- SDSU Data Science Symposium 2024 - Docker Workshop☆40Updated last year
- Exon is an OLAP query engine specifically for biology and life science applications.☆63Updated last month
- Investigation for PyDataLondon 2023 and ODSC 2023 conference comparing Pandas 2, Polars and Dask☆11Updated last year
- Instant search for and access to many datasets in Pyspark.☆34Updated 2 years ago
- ☆18Updated 4 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 3 years ago
- JumpSpark - A modern cookiecutter template for pyspark projects with batteries included.☆10Updated 2 years ago
- PyCon Talks 2022 by Antoine Toubhans☆23Updated 2 years ago
- DuckDB Extension for reading and writing FASTA and FASTQ Files☆21Updated last year
- Implementation of LSTM for detecting regions of Neanderthal introgression in modern human genomes☆9Updated 5 years ago
- DuckDB Extension for working with bioinformatic data.☆16Updated last year
- Self-exploratory Streamlit app to know more about palmer penguins.☆11Updated last year
- ☆26Updated 2 years ago