yennanliu/NYC_Taxi_Trip_Duration

Readme badge preview -

If you own this repo, copy the snippet below and add it to your README.md

[![RelatedRepos](https://img.shields.io/badge/related-repos-yellow)](https://relatedrepos.com/gh/yennanliu/NYC_Taxi_Trip_Duration)

yennanliu / NYC_Taxi_Trip_Duration

Develop ML models predict taxi trip duration in NYC. Ranked : Top 6% | RMSLE : 0.377 (Kaggle) | #DS

☆17

Alternatives and similar repositories for NYC_Taxi_Trip_Duration

Users that are interested in NYC_Taxi_Trip_Duration are comparing it to the libraries listed below. We may earn a commission when you buy through links labeled 'Ad' on this page.

Sorting:

yennanliu / NYC_Taxi_Pipeline
View on GitHub
Stream/batch system with Hadoop, Spark on NYC taxi data | #DE
☆26Apr 10, 2026Updated 3 months ago
cvilla87 / PySpark-ETL-Telecom
View on GitHub
Jupyter Notebook showing how to process Telecom datasets using PySpark (SparkSQL and DataFrames) and plotting the results using Matplotli…
☆17Dec 3, 2018Updated 7 years ago
rvilla87 / ETL-PySpark
View on GitHub
ETL (Extract, Transform and Load) with the Spark Python API (PySpark) and Hadoop Distributed File System (HDFS)
☆17Dec 18, 2018Updated 7 years ago
rss161030 / ETL-processes-using-Sqoop-Hadoop-Hive-Spark-and-Scala
View on GitHub
I implemented various ETL processes like loading the data using sqoop from mysql to hdfs, transform the data using Spark and Scala, perfo…
☆10Oct 20, 2017Updated 8 years ago
vanessaaleung / data-science-notes
View on GitHub
Data Science Learning Notes
☆11Oct 18, 2023Updated 2 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
Bhanu-12 / AppliedAI
View on GitHub
Solving the assignments and projects from https://www.appliedaicourse.com/
☆12Jul 31, 2019Updated 6 years ago
vuthanhhai2302 / Applied-Pyspark
View on GitHub
My applied big data analytic project with pyspark.
☆10Sep 21, 2022Updated 3 years ago
cevoaustralia / glue-vscode
View on GitHub
Local Development of AWS Glue with Docker and Visual Studio Code
☆14Nov 29, 2021Updated 4 years ago
antimoz-om / Antimoz
View on GitHub
A data engineering pipeline for digital marketers.
☆11Dec 21, 2018Updated 7 years ago
DIYBigData / spark-data-analysis-projects
View on GitHub
A collection of data analysis projects done using PySpark via Jupyter notebooks.
☆10Oct 8, 2022Updated 3 years ago
itversity / retail_db_json
View on GitHub
☆14Sep 14, 2021Updated 4 years ago
alvarobartt / covid-daily
View on GitHub
🦠 COVID-19 Daily Data from Worldometers with Python
☆13Feb 28, 2021Updated 5 years ago
shubhamgosain / twitter-Sentiment-Analysis-using-hadoop
View on GitHub
A Project where one can fetch and read tweets and show the analysis like who is most influential
☆29Oct 27, 2023Updated 2 years ago
RishiSankineni / Machine-Learning-Pipeline-LR-Pyspark
View on GitHub
Power Plant ML Pipeline Application - Apache Spark
☆12Dec 12, 2016Updated 9 years ago
GPU virtual machines on DigitalOcean Gradient AI • Ad
Get to production fast with high-performance AMD and NVIDIA GPUs you can spin up in seconds. The definition of operational simplicity.
prakashdontaraju / google-cloud-ecommerce
View on GitHub
ecommerce GCP Streaming pipeline ― Cloud Storage, Compute Engine, Pub/Sub, Dataflow, Apache Beam, BigQuery and Tableau; GCP Batch pipelin…
☆11Mar 9, 2022Updated 4 years ago
minzhang-1 / PointHop-PointHop2_Spark
View on GitHub
A fast and low memory requirement version of PointHop and PointHop++, which is built upon Apache Spark.
☆10Jul 14, 2020Updated 6 years ago
mateuspicanco / project-atlas-sao-paulo
View on GitHub
A project for the development of rich geospatial data from the city of São Paulo for use in Machine Learning models.
☆12Jul 4, 2021Updated 5 years ago
MLWhiz / Spark_Projects
View on GitHub
Spark Projects for the Berkeley Data Science Course
☆13Aug 12, 2015Updated 10 years ago
cxysteven / MapBJ
View on GitHub
☆12Apr 13, 2017Updated 9 years ago
vsouza / spark-kinesis-redshift
View on GitHub
Example project for consuming AWS Kinesis streamming and save data on Amazon Redshift using Apache Spark
☆11May 22, 2018Updated 8 years ago
camposvinicius / gcp-etl
View on GitHub
This is a pipeline of an ETL application in GCP with open airport code data, which you can find here: https://datahub.io/core/airport-cod…
☆15Nov 15, 2021Updated 4 years ago
ZJ-LEARN / MVMT-STN
View on GitHub
☆10Dec 21, 2021Updated 4 years ago
ketgo / marshmallow-pyspark
View on GitHub
Marshmallow serializer integration with pyspark
☆12Dec 29, 2023Updated 2 years ago
Managed hosting for WordPress and PHP on Cloudways • Ad
Managed hosting for WordPress, Magento, Laravel, or PHP apps, on multiple cloud providers. Deploy in minutes on Cloudways by DigitalOcean.
nballou / nickballousite
View on GitHub
A revamped version of my website, made with Wowchemy, Hugo Academic, and deployed with Netlify.
☆13Mar 13, 2026Updated 4 months ago
boupetch / rsleep
View on GitHub
Sleep Data Analysis with R
☆16Jun 2, 2024Updated 2 years ago
zekeriyyaa / Traffic-Data-Analysis-with-Apache-Spark-Based-on-Mobile-Robot-Data
View on GitHub
Mobile robot data were analyzed with Apache-Spark to extract five different statistical result such as travel time, waiting time, average…
☆15Apr 5, 2022Updated 4 years ago
martandsingh / ApacheSpark
View on GitHub
This repository will help you to learn about databricks concept with the help of examples. It will include all the important topics which…
☆105Sep 26, 2025Updated 9 months ago
ibaiGorordo / Tensorflow-Mobile-Generic-Object-Localizer
View on GitHub
Python Tensorflow 2 scripts for detecting objects of any class in an image without knowing their label.
☆16Sep 18, 2021Updated 4 years ago
xdxuyang / ALRDC
View on GitHub
☆16Nov 2, 2020Updated 5 years ago
zekeriyyaa / PySpark-Structured-Streaming-ROS-Kafka-ApacheSpark-Cassandra
View on GitHub
A structured streaming was applied to the robot data from ROS-Gazebo simulation environment using Apache Spark. Data is collected in Kafk…
☆19Feb 6, 2022Updated 4 years ago
chaithanya21 / Sentiment-Analysis-using-Pyspark-on-Multi-Social-Media-Data
View on GitHub
In this mini-project i have chosen to do sentiment analysis of social media websites such as twitter and reddit to gain insights into the…
☆12Mar 5, 2020Updated 6 years ago
hyunjoonbok / PySpark
View on GitHub
PySpark functions and utilities with examples. Assists ETL process of data modeling
☆103Dec 3, 2020Updated 5 years ago
1-Click AI Models by DigitalOcean Gradient • Ad
Deploy popular AI models on DigitalOcean Gradient GPU virtual machines with just a single click. Zero configuration with optimized deployments.
innat / Transfer-Learning-PySpark
View on GitHub
Multi-Class Classification | Transfer Learning With PySpark
☆13Nov 12, 2019Updated 6 years ago
Martialhimanshu / GaanaSuno
View on GitHub
GaanaSuno is an application that lets users upload, store and play all of your music from the cloud. Additionally, a user can comment and…
☆12Aug 18, 2018Updated 7 years ago
shangzongjiang / SNAS4MTF
View on GitHub
☆14Dec 11, 2024Updated last year
MHassaanButt / Flight-Delays-Prediction
View on GitHub
In this project, I used Decision Tree Learning Model as the main algorithm to build the model. Due to the big amount of flight data, we i…
☆12Dec 21, 2021Updated 4 years ago
ehsanmok / sparkling-titanic
View on GitHub
Training models with Apache Spark, PySpark for Titanic Kaggle competition
☆14Sep 23, 2016Updated 9 years ago
yennanliu / utility_shell
View on GitHub
Collection of shell/Bash scripts for various using cases | #SE
☆11Jul 10, 2026Updated 2 weeks ago
Divergent-Insights / dbt-dataquality
View on GitHub
Creates simple data models on Snowflake to report dbt source freshness and tests
☆30Jun 14, 2023Updated 3 years ago