2 minute read

This Python project is the second project in Google’s Advanced Data Analytics Course. It focused on performing the EDA process and clean data on a dataset provided by Waze to predict user churn. The scope of this project stage is:

  • Data Exploration
  • Data Cleaning
  • Handle Outliers
  • Building Visualizations
  • Evaluating and sharing results

This project was original created in a GitHub Repository. If you experience any formatting issues, please view the original GitHub project.

Please reference the Waze EDA Project Jupyter Notebook for more detailed analysis, however key insights are shared below.

Current Project Overview

Waze leadership has tasked the data team with developing a machine learning model to predict user churn. This model will help improve user retention and support business growth. The team has already completed the project proposal and initial data processing. Now, the next step is to conduct Exploratory Data Analysis (EDA) to better understand the data and generate visual insights before model development.

  1. Data Exploration
  2. Data Cleaning
  3. Building Visualizations
  4. Evaluating and sharing results

Quick Insights over project

1. Was there anything that led you to believe the data was erroneous or problematic in any way?

Most of the data was validated through the EDA process. However, many variables had very improbable outlier values, such as driven_km_drives, which had a max of 15,420 KM which is impossible to drive in a day.

alt text

It also appears that monthly variables might have mistakes within the data. On activity_days and driving days, one has a max value of 31 days while the other has a max value of 30, which could indicate that the data was not collected within the same month. alt text

2. Did your investigation give rise to further questions that you would like to explore or ask the Waze team about?

Yes, we need to confirm if the data for each variable was collected during the same month. Some values have 30 days, while others have 31 days. It also appears that many long-term members suddenly started using the application in the last month. Why did this occur?

alt text

3. What percentage of users churned and what percentage were retained?

~18% of users churned, and ~82% were retained.

alt text

4. What factors correlated with user churn? How?

Based on the current dataset, KM per Driving Day had a positive correlation with churn rate. The further a user drove had a positive relationship with churn.

alt text

Number of Driving Days had a negative relationship with churn. User’s who drove more days in the month were less likely to churn.

alt text

Conclusion and Acknowledgement

This project is part of Google’s Advanced Data Analytics Professional Certificate Course, specifically as the end-of-course project of the “Go Beyond the Numbers: Translate Data into Insights” sub-course within the broader Google Data Analytics course. All data was provided by Google, and the scenario was created with questions to answer.