Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆13Updated 2 years ago
Alternatives and similar repositories for wbz:
Users that are interested in wbz are comparing it to the libraries listed below
- ☆11Updated 2 years ago
- Genomic BigData Warehousing with Apache Spark and LakeHouse Architecture☆11Updated 2 years ago
- Demo on how to use Prefect with Docker☆25Updated 2 years ago
- The goal of this project is to identify students at risk of dropping out the school☆22Updated 3 years ago
- Python implementation of Age-Partitioned Bloom Filter with S3 periodic backup support.☆11Updated 3 months ago
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 3 years ago
- A repo of Flyte-related conference talks☆14Updated last year
- ☆20Updated 2 years ago
- ☆12Updated 3 years ago
- SparkBLAST is a parallelization of a sequence alignment application (BLAST) that employs cloud computing for the provisioning of computat…☆9Updated 7 years ago
- Demo of Hydra☆18Updated 3 years ago
- A framework for simulating e-commerce data and interactions that can be used to build recommendation systems☆10Updated last year
- ☆8Updated 10 months ago
- Pandas ExtensionDtypes for dealing with genomics data☆47Updated 5 months ago
- High-performance tokenized language data-loader for Python C++ extension☆13Updated 9 months ago
- SDSU Data Science Symposium 2024 - Docker Workshop☆40Updated last year
- Workshop about DVC VSCode Extension☆14Updated 7 months ago
- Operations Research Algorithms☆17Updated last year
- ☆21Updated last year
- ☆17Updated 2 months ago
- ☆13Updated last year
- Demo of DuckDB Spark API implements. Same Pyspark code, but DuckDB under the hood☆14Updated last year
- Demo repository to lambda-fy your dbt runs☆11Updated last year
- duckdb-etl-framework☆10Updated 4 months ago
- Tutorials on data science/machine learning using Python with a life-science perspective. When I started learning programming and machine …☆12Updated 2 years ago
- The Baseline Site Selection Tool implements simulation tools for clinical trial enrollment.☆18Updated 2 years ago
- A Machine Learning library for predicting and modelling learner engagement with educational resources☆12Updated last year
- Projects completed under LinuxWorld Informatics Ltd. - MLOps Training.☆12Updated 4 years ago
- Repository with code, notebook and slides for my PyData & PyConDE talk 2023 about Clean Coding practises.☆12Updated 2 years ago
- A FastMCP tool to search and retrieve Polars API documentation.☆46Updated last week