A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.
☆20Nov 12, 2021Updated 4 years ago
Alternatives and similar repositories for intro-to-colab-pyspark-emr
Users that are interested in intro-to-colab-pyspark-emr are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.
Sorting:
- ☆18Nov 9, 2025Updated 6 months ago
- Source Code for 'Applied Data Science Using PySpark' by Ramcharan Kakarla, Sundar Krishnan, and Sridhar Alla☆48May 18, 2021Updated 4 years ago
- To try CTC in Keras☆19Apr 8, 2019Updated 7 years ago
- Implementation of Spark code in Jupyter notebook. Topics include: RDDs and DataFrame, exploratory data analysis (EDA), handling multiple …☆30Aug 26, 2020Updated 5 years ago
- Example of a Streamlit data app powered by Vaex☆11Jul 7, 2022Updated 3 years ago
- Deploy on Railway without the complexity - Free Credits Offer • AdConnect your repo and Railway handles the rest with instant previews. Quickly provision container image services, databases, and storage volumes.
- sql-for-data-engineering-course☆18May 12, 2023Updated 2 years ago
- Simple GUI to load a PDF/Docx/txt file and have LM Studio Answer based off of it.☆14Jul 31, 2024Updated last year
- ☆10Oct 17, 2021Updated 4 years ago
- ☆34Jul 27, 2021Updated 4 years ago
- Bank Marketing data classification☆12Oct 2, 2020Updated 5 years ago
- Content based Recommendation☆14Jun 23, 2021Updated 4 years ago
- Targeted Data Generation with Large Language Models☆19Jun 25, 2024Updated last year
- This repo contains a list of questions to practice SQL with the Sakila Database.☆10Jul 29, 2022Updated 3 years ago
- Peakrs Dataframe is a library and framework facilitates the extraction, transformation, and loading (ETL) of data.☆18Oct 26, 2023Updated 2 years ago
- Deploy to Railway using AI coding agents - Free Credits Offer • AdUse Claude Code, Codex, OpenCode, and more. Autonomous software development now has the infrastructure to match with Railway.
- pdfplot is a Python library for easily managing your matplotlib figures as PDF files.☆13Jul 8, 2020Updated 5 years ago
- A future proof opinionated software to manage your life in plaintext : todo, agenda, journal and notes.☆23Nov 5, 2023Updated 2 years ago
- A lightweight open-source package to fine-tune embedding models.☆22Feb 4, 2024Updated 2 years ago
- TuneTables is a tabular classifier that implements prompt tuning for frozen prior-fitted networks.☆24Mar 31, 2025Updated last year
- https://adventofcode.com/2024☆12Dec 25, 2024Updated last year
- A complement to ANTLR to get a model from your AST and transform it☆14Apr 20, 2020Updated 6 years ago
- ☆11Aug 13, 2023Updated 2 years ago
- ☆19Jan 20, 2024Updated 2 years ago
- Official TensorFlow code for the paper "DeepWay: a Deep Learning Waypoint Estimator for Global Path Generation".☆11Jun 24, 2022Updated 3 years ago
- Managed Database hosting by DigitalOcean • AdPostgreSQL, MySQL, MongoDB, Kafka, Valkey, and OpenSearch available. Automatically scale up storage and focus on building your apps.
- Privacy friendly framework for IoT Cloud☆27Dec 8, 2022Updated 3 years ago
- Decorators for logging purposes for all your dataframes☆14Jan 31, 2025Updated last year
- Simple video/audio player using pyside6/pyqt6 and VLC!☆25Feb 6, 2026Updated 3 months ago
- Starter template for python projects☆18Feb 15, 2024Updated 2 years ago
- Dockerfile for audiogrep and pocketsphinx☆12Oct 12, 2016Updated 9 years ago
- A platform for storing large semantic networks on MongoDB☆22Jun 20, 2011Updated 14 years ago
- PyTorch Implementation of A Deep Learning System for Predicting Size and Fit in Fashion E-Commerce (RecSys'19)☆14Aug 23, 2021Updated 4 years ago
- A GitHub Action to run a pytest command when new code is pushed into your repo☆58Oct 14, 2025Updated 6 months ago
- Repo to host a comprehensive list of all my Public Gists with a short description for each item and a link to the Gist pages in question.…☆16Apr 27, 2021Updated 5 years ago
- Virtual machines for every use case on DigitalOcean • AdGet dependable uptime with 99.99% SLA, simple security tools, and predictable monthly pricing with DigitalOcean's virtual machines, called Droplets.
- Cosine Similary Search in ElasticSearch + FAISS GPU☆12Mar 24, 2022Updated 4 years ago
- Web Scraping and Knowledge Graphs with Machine Learning [Guide]☆10Jul 1, 2021Updated 4 years ago
- Control flow graph and test requirement generation for a Java code.☆14Nov 19, 2014Updated 11 years ago
- The repository contains all the work including projects, notes, and articles related to ML Engineering while I am learning.☆10Dec 4, 2022Updated 3 years ago
- Ensemble of ARIMA, prophet and LSTMS RNN☆36Aug 26, 2017Updated 8 years ago
- ☆16Jul 13, 2022Updated 3 years ago
- Homepage of Software Engineering for Machine Learning☆17Feb 4, 2026Updated 3 months ago