cal-data-eng / sp21Links
Data Engineering Course Website
β14Updated 11 months ago
Alternatives and similar repositories for sp21
Users that are interested in sp21 are comparing it to the libraries listed below
Sorting:
- Convert monolithic Jupyter notebooks π into maintainable Ploomber pipelines. πβ79Updated 11 months ago
- Lambda Learner is a library for iterative incremental training of a class of supervised machine learning models.β42Updated 2 years ago
- Python stream processing for humansβ185Updated 6 months ago
- hooqu is a library built on top of Pandas-like Dataframes for defining "unit tests for data". This is a spiritual port of Apache Deequ toβ¦β29Updated 8 months ago
- Data science and ML with Daskβ14Updated 4 years ago
- Data pipelines from re-usable componentsβ107Updated 2 years ago
- Supporting content (slides and exercises) for the Pearson video series covering best practices for developing scalable applications with β¦β52Updated 7 months ago
- The Open Source Deep Learning Glossaryβ37Updated 5 years ago
- dagster scikit-learn pipeline example.β46Updated 2 years ago
- A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics β¦β20Updated 3 years ago
- A tutorial on locality sensitive hashing, using MinHashing for document similarity and CosineSimilarity for Euclidean space similarity.β33Updated 4 years ago
- A curated list of example code to collect data from Web APIs using DataPrep.Connector.β34Updated 2 years ago
- Vinum is a SQL processor for Python, designed for data analysis workflows and in-memory analytics.β65Updated 4 years ago
- Automatically check mismatch between code and comments using AI and MLβ53Updated 4 years ago
- A data wrangling and modeling tool.β63Updated 2 years ago
- Flow with FlorDB π»β154Updated 2 months ago
- A Python-to-SQL transpiler as replacement for Python Pandasβ48Updated 2 years ago
- Datamallet is a python library which contains several helper functions and module for the common tasks in a typical data science workflowβ¦β11Updated 3 years ago
- β30Updated 3 years ago
- NitroML is a modular, portable, and scalable model-quality benchmarking framework for Machine Learning and Automated Machine Learning (Auβ¦β43Updated 4 years ago
- MLOps simplified. One-stop AI delivery platform, all the features you need.β100Updated this week
- β30Updated last year
- dbd is a database prototyping tool that enables data analysts and engineers to quickly load and transform data in SQL databases.β57Updated 3 years ago
- Titus 2 : Portable Format for Analytics (PFA) implementation for Python 3.4+β23Updated 2 years ago
- A LSM-Tree key/value database in Python.β24Updated last year
- β36Updated last week
- Ray provider for Apache Airflowβ48Updated last year
- a collection of resources and blogs about Apache Supersetβ86Updated 3 years ago
- Learn Kubeflow with Arriktoβ15Updated 3 years ago
- β79Updated 2 years ago