Wittline / wbz
A parallel implementation of the bzip2 data compressor in python, this data compression pipeline is using algorithms like Burrows–Wheeler transform (BWT) and Move to front (MTF) to improve the Huffman compression. For now, this tool only will be focused on compressing .csv files, and other files on tabular format.
☆14Updated 2 years ago
Related projects: ⓘ
- Demo on how to use Prefect with Docker☆26Updated 2 years ago
- Intro to Polars Tutorial☆19Updated last year
- A repo of Flyte-related conference talks☆13Updated 6 months ago
- Demo of Hydra☆18Updated 2 years ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics …☆17Updated 2 years ago
- The goal of this project is to identify students at risk of dropping out the school☆22Updated 3 years ago
- Challenge Data Engineer☆25Updated 2 years ago
- Some example projects for Data Engineers to build, end-to-end.☆26Updated 10 months ago
- ☆11Updated 2 years ago
- ☆12Updated last month
- Demo on how to use Prefect 2 in an ML project☆40Updated last year
- ☆22Updated 2 months ago
- Operations Research Algorithms☆17Updated 6 months ago
- Create a local dashboard to visualize and filter your GitHub feed☆29Updated 2 years ago
- A Probabilistic Programming Language in 70 lines of Python. Code for the blog post https://mrandri19.github.io/2022/01/12/a-PPL-in-70-lin…☆17Updated 2 years ago
- SKIP for AI☆20Updated 4 years ago
- Udacity Data Streaming Nanodegree Program☆22Updated 3 years ago
- Powerful rapid automatic EDA and feature engineering library with a very easy to use API 🌟☆53Updated 2 years ago
- PyCon Talks 2022 by Antoine Toubhans☆23Updated 2 years ago
- Deploy A/B testing infrastructure in a containerized microservice architecture for Machine Learning applications.☆39Updated last year
- This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…☆91Updated last month
- reating a modern data pipeline using a combination of Terraform, AWS Lambda and S3, Snowflake, DBT, Mage AI, and Dash.☆12Updated last year
- Simple demonstration of interactions between a streamlit app and the mlflow tracking api☆20Updated 3 years ago
- Simplify Big Data Analytics with Amazon EMR, published by Packt☆14Updated last year
- A library to create lore plots (logistic regression of the prevalence of a categorical variable in function of a continuous feature)☆15Updated 3 weeks ago
- ☆17Updated 4 months ago
- In this project, we are going to use a random forest algorithm (or any other preferred algorithm) from scikit-learn library to help predi…☆9Updated 3 years ago
- Use Multiple Linear Regression, Python, Pandas, and Matplotlib to analyze the lifetime value and the key factors of the ‘Telco Customer C…☆10Updated 4 years ago
- ☆25Updated 2 years ago
- The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on…☆25Updated 2 years ago