Finding Potential Unhappy Employees

Summary of the Project
Problem
Solution
Process
Further Questions

Summary of the Project

Solved the Company X's problem by preforming the following tasks below:

Designed a dashboard utilizing Figma
Brainstormed key data points that will be important for the visualization
Data Cleaned, Data Wrangled, and Transformed Data using MySql
Visualized the data using Tableau by creating a dashboard that solves the problem

Problem

Company X is currently experiencing problems with their HR data. They currently lack any insight on who is unhappy regarding their salary. Please identify the employee's that might ask for a raise. Please keep in mind that the dataset needs to be cleaned before visualizing the data. The dataset contains six tables and contains numerous errors.

Link to Dataset:

MySQL :: Employees Sample Database

For legal information, see the Legal Notices. For help with using MySQL, please visit the MySQL Forums, where you can discuss your issues with other MySQL users. Document generated on: 2021-09-16 (revision: 70812)

dev.mysql.com

Picture of the Tables:

Solution

public.tableau.com

Process

1) Brainstorming - MySql

Thought Process:

When I looked at all of the columns, I thought of the possible indicators on why someone would ask for a raise.

Why someone would ask for a raise?

High feedback
Unhappy
Underpaid

I thought if there was any way to find out if someone was underpaid for their role. I thought we could use the average salary over the title to find the average.

I would write something like this below:

round(avg(s.salary)
				OVER (PARTITION BY t.title),2)

Once I figured out how to find the average salary per position, I knew I could piece everything together. Since I could see the end goal, I'd like to write out an imaginary SQL code of the end result.

Imaginary Code

SELECT
emp_no,
first_name,
last_name,
title,
department, 
salary, 
avg salary per position
FROM
imaginarytable

2) Executing - Data Wrangling / Data Cleaning / Data Transformation - MySql

Thought Process:

Once I had a clear picture of the end result, I had a picture of the starting table. The employee's table should be the starting point since it contains most of the valuable information.

I knew that I had to join the following tables to the employee table below:

Salary
Titles
Department Name
Department Number

It looks like there's a loss of information in the salary table. I tried joining employee.emp_no to the salaries.emp_no with an inner join but I noticed that there was a loss of data. I found that the loss came from the salaries table, so I need to make a case statement to account for it later.

Employee Table

Salaries Table

As shown in the Salaries table above, you can tell that there are multiple cases where emp_no are shown.

We can make this assumption "Employees can receive promotions in this company" based on the following factors below:

There are duplicate emp_no's
Each emp_no salary increases as the year date goes on
The from_date to to_date correlate with each other
It looks like 9999-01-01 is the latest date possible

Based on this information, we could also make the assumption that 9999-01-01 means that they are currently employed.

The data that we need extract is the emp_no's with the to_date = 9999-01-01. One way we can do that is by writing a simple filter like below.

WHERE

to_date = 9999-01-01

While that's a way to get the data, a real world scenario is to account for unemployed data. In order to account for both employed and unemployed data, we could use the following window function.

rank ()
		OVER (PARTITION BY emp_no ORDER BY to_date DESC)

We could use this sub query to give a rank to all of the emp_no's with the highest to_date number. This is shown by the code below.

Inner sub query

SELECT
	emp_no,
	salary,
	to_date,
	rank ()
		OVER (PARTITION BY emp_no ORDER BY to_date DESC)
FROM
	salaries

Once we have this sub query, we could write another sub query that we can use as the salary. In addition to the original columns, we are also adding an employment status.

Outer sub query

 SELECT
    rnk.emp_no,
    rnk.salary,
    rnk.to_date,
    CASE
        WHEN rnk.to_date < current_date()
            THEN 'NO LONGER EMPLOYED'
        WHEN rnk.to_date >= current_date()
            THEN 'EMPLOYED'
        ELSE 'UNKNOWN'
    END AS "employment_status" 
    FROM
        (
        SELECT
        emp_no,
        salary,
        to_date,
        rank ()
            OVER (PARTITION BY emp_no ORDER BY to_date DESC) AS "rnk"
        FROM salaries
        ) rnk
    WHERE
    rnk.rnk = 1

Further Data Cleaning

In order to account for the loss of data in the salary section, we needed to account for the nulls. The amount of nulls is too big to ignore. For the characters that are missing in the data, we want to call it UNKNOWN. We also do not care about the average salary for unemployed data so can be 0.00.

    IFNULL(t.title,'UNKNOWN') AS "title",
    IFNULL(s.salary,'UNKNOWN') AS "salary",
    IFNULL(d.dept_name,'UNKNOWN') AS "dept_name",
    IFNULL(s.employment_status,'UNKNOWN') AS "employment_status"

  CASE 
		WHEN s.employment_status = 'EMPLOYED' 
			THEN round(avg(CASE WHEN s.employment_status = 'EMPLOYED' THEN s.salary ELSE 0 END)
				OVER (PARTITION BY t.title),2)
					ELSE 0.00

Other additions in the code below:

Order By emp_no ASC

This is to get first emp_no

This should be the final query to the full results. I would use this code as a CTE or a view incase Company X wants to see the unknown employees.

SELECT
    e.emp_no,
    e.first_name,
    e.last_name,
    e.gender, 
    e.birth_date,
    IFNULL(t.title,'UNKNOWN') AS "title",
    IFNULL(s.salary,'UNKNOWN') AS "salary",
    IFNULL(d.dept_name,'UNKNOWN') AS "dept_name",
    IFNULL(s.employment_status,'UNKNOWN') AS "employment_status",
    CASE 
		WHEN s.employment_status = 'EMPLOYED' 
			THEN round(avg(CASE WHEN s.employment_status = 'EMPLOYED' THEN s.salary ELSE 0 END)
				OVER (PARTITION BY t.title),2)
					ELSE 0.00 
	END AS avg_salary
FROM employees e
LEFT OUTER JOIN
    (
    SELECT
    rnk.emp_no,
    rnk.salary,
    rnk.to_date,
    CASE
        WHEN rnk.to_date < current_date()
            THEN 'NO LONGER EMPLOYED'
        WHEN rnk.to_date >= current_date()
            THEN 'EMPLOYED'
        ELSE 'UNKNOWN'
    END AS "employment_status" 
    FROM
        (
        SELECT
        emp_no,
        salary,
        to_date,
        rank ()
            OVER (PARTITION BY emp_no ORDER BY to_date DESC) AS "rnk"
        FROM salaries
        ) rnk
    WHERE
    rnk.rnk = 1
    ) s
    ON e.emp_no = s.emp_no
LEFT OUTER JOIN titles t
    ON e.emp_no = t.emp_no
        AND t.to_date = s.to_date
LEFT OUTER JOIN dept_emp de
    ON e.emp_no = de.emp_no
        AND de.to_date = s.to_date
LEFT OUTER JOIN departments d
    ON d.dept_no = de.dept_no
ORDER BY
    e.emp_no ASC;

3) Brainstorming - Figma

This part is going to be short because I wanted to emphasize my SQL skills with this project.

I still wanted to do something cool and unique for this project because I enjoy the design aspect of dashboarding. For the last two years, "Dark Mode" versions of applications became really popular and visually appealing. I wanted to take my crack at it. I'm pretty new to design, so it's in your best interest to read this guide on it instead.

Guide on Dark mode

Material Design

Build beautiful, usable products faster. Material Design is an adaptable system-backed by open-source code-that helps teams build high quality digital experiences.

material.io

The first thing that I did was brainstorm the graphs that I wanted to add to the dashboard. I made a draft dashboard based on these graphs and filters.

I knew that I wanted the following sections below:

Pie Graph that shows total employees
Bar Graph for underpaid employees
Drill down horizontal bar graph of "Department → Employees"
Male / Female Filter
Title
Description
Emoji's and links

Once I had those sections planned out, I just made the template for this dashboard using Tableau

After I had the dashboard planned out, I designed it out on Figma.

Figma Design

3) Executing - Figma / Tableau

This part was really easy. It basically came down to making the sheets that I need and putting it in the the right area. I mastered containers in Tableau, so it's really easy for me to build clean looking dashboards.

You can tell the difference between someone good at Tableau vs someone who just picked it up by looking at their "Item Hierarchy".

Item Hierarchy

Final Deliverable

Further Questions

After completing this project, there was a couple things that I thought would be interesting.

Thought would be interesting to have the max salary per position.
I could also add a button that shows more detailed information regarding underpaid umployees
It would be interesting to find the percentage of employees that would leave if they are underpaid.

I might actually work on this and try to find the p-value.