Summary of the Project
Solved the Company X's problem by preforming the following tasks below:
- Designed a dashboard utilizing Figma
- Brainstormed key data points that will be important for the visualization
- Data Cleaned, Data Wrangled, and Transformed Data using MySql
- Visualized the data using Tableau by creating a dashboard that solves the problem
Problem
Company X is currently experiencing problems with their HR data. They currently lack any insight on who is unhappy regarding their salary. Please identify the employee's that might ask for a raise. Please keep in mind that the dataset needs to be cleaned before visualizing the data. The dataset contains six tables and contains numerous errors.
Link to Dataset:
Picture of the Tables:
Solution
Process
1) Brainstorming - MySql
Thought Process:
When I looked at all of the columns, I thought of the possible indicators on why someone would ask for a raise.
Why someone would ask for a raise?
- High feedback
- Unhappy
- Underpaid
I thought if there was any way to find out if someone was underpaid for their role. I thought we could use the average salary over the title to find the average.
I would write something like this below:
round(avg(s.salary)
OVER (PARTITION BY t.title),2)
Once I figured out how to find the average salary per position, I knew I could piece everything together. Since I could see the end goal, I'd like to write out an imaginary SQL code of the end result.
Imaginary Code
SELECT
emp_no,
first_name,
last_name,
title,
department,
salary,
avg salary per position
FROM
imaginarytable
2) Executing - Data Wrangling / Data Cleaning / Data Transformation - MySql
Thought Process:
Once I had a clear picture of the end result, I had a picture of the starting table. The employee's table should be the starting point since it contains most of the valuable information.
I knew that I had to join the following tables to the employee table below:
- Salary
- Titles
- Department Name
- Department Number
It looks like there's a loss of information in the salary table. I tried joining employee.emp_no to the salaries.emp_no with an inner join but I noticed that there was a loss of data. I found that the loss came from the salaries table, so I need to make a case statement to account for it later.
Employee Table
Salaries Table
As shown in the Salaries table above, you can tell that there are multiple cases where emp_no are shown.
We can make this assumption "Employees can receive promotions in this company" based on the following factors below:
- There are duplicate emp_no's
- Each emp_no salary increases as the year date goes on
- The from_date to to_date correlate with each other
- It looks like 9999-01-01 is the latest date possible
Based on this information, we could also make the assumption that 9999-01-01 means that they are currently employed.
The data that we need extract is the emp_no's with the to_date = 9999-01-01. One way we can do that is by writing a simple filter like below.
WHERE
to_date = 9999-01-01
While that's a way to get the data, a real world scenario is to account for unemployed data. In order to account for both employed and unemployed data, we could use the following window function.
rank ()
OVER (PARTITION BY emp_no ORDER BY to_date DESC)
We could use this sub query to give a rank to all of the emp_no's with the highest to_date number. This is shown by the code below.
Inner sub query
SELECT
emp_no,
salary,
to_date,
rank ()
OVER (PARTITION BY emp_no ORDER BY to_date DESC)
FROM
salaries
Once we have this sub query, we could write another sub query that we can use as the salary. In addition to the original columns, we are also adding an employment status.
Outer sub query
SELECT
rnk.emp_no,
rnk.salary,
rnk.to_date,
CASE
WHEN rnk.to_date < current_date()
THEN 'NO LONGER EMPLOYED'
WHEN rnk.to_date >= current_date()
THEN 'EMPLOYED'
ELSE 'UNKNOWN'
END AS "employment_status"
FROM
(
SELECT
emp_no,
salary,
to_date,
rank ()
OVER (PARTITION BY emp_no ORDER BY to_date DESC) AS "rnk"
FROM salaries
) rnk
WHERE
rnk.rnk = 1
Further Data Cleaning
In order to account for the loss of data in the salary section, we needed to account for the nulls. The amount of nulls is too big to ignore. For the characters that are missing in the data, we want to call it UNKNOWN. We also do not care about the average salary for unemployed data so can be 0.00.
IFNULL(t.title,'UNKNOWN') AS "title",
IFNULL(s.salary,'UNKNOWN') AS "salary",
IFNULL(d.dept_name,'UNKNOWN') AS "dept_name",
IFNULL(s.employment_status,'UNKNOWN') AS "employment_status"
CASE
WHEN s.employment_status = 'EMPLOYED'
THEN round(avg(CASE WHEN s.employment_status = 'EMPLOYED' THEN s.salary ELSE 0 END)
OVER (PARTITION BY t.title),2)
ELSE 0.00
Other additions in the code below:
- Order By emp_no ASC
- This is to get first emp_no
This should be the final query to the full results. I would use this code as a CTE or a view incase Company X wants to see the unknown employees.
SELECT
e.emp_no,
e.first_name,
e.last_name,
e.gender,
e.birth_date,
IFNULL(t.title,'UNKNOWN') AS "title",
IFNULL(s.salary,'UNKNOWN') AS "salary",
IFNULL(d.dept_name,'UNKNOWN') AS "dept_name",
IFNULL(s.employment_status,'UNKNOWN') AS "employment_status",
CASE
WHEN s.employment_status = 'EMPLOYED'
THEN round(avg(CASE WHEN s.employment_status = 'EMPLOYED' THEN s.salary ELSE 0 END)
OVER (PARTITION BY t.title),2)
ELSE 0.00
END AS avg_salary
FROM employees e
LEFT OUTER JOIN
(
SELECT
rnk.emp_no,
rnk.salary,
rnk.to_date,
CASE
WHEN rnk.to_date < current_date()
THEN 'NO LONGER EMPLOYED'
WHEN rnk.to_date >= current_date()
THEN 'EMPLOYED'
ELSE 'UNKNOWN'
END AS "employment_status"
FROM
(
SELECT
emp_no,
salary,
to_date,
rank ()
OVER (PARTITION BY emp_no ORDER BY to_date DESC) AS "rnk"
FROM salaries
) rnk
WHERE
rnk.rnk = 1
) s
ON e.emp_no = s.emp_no
LEFT OUTER JOIN titles t
ON e.emp_no = t.emp_no
AND t.to_date = s.to_date
LEFT OUTER JOIN dept_emp de
ON e.emp_no = de.emp_no
AND de.to_date = s.to_date
LEFT OUTER JOIN departments d
ON d.dept_no = de.dept_no
ORDER BY
e.emp_no ASC;
3) Brainstorming - Figma
This part is going to be short because I wanted to emphasize my SQL skills with this project.
I still wanted to do something cool and unique for this project because I enjoy the design aspect of dashboarding. For the last two years, "Dark Mode" versions of applications became really popular and visually appealing. I wanted to take my crack at it. I'm pretty new to design, so it's in your best interest to read this guide on it instead.
Guide on Dark mode
The first thing that I did was brainstorm the graphs that I wanted to add to the dashboard. I made a draft dashboard based on these graphs and filters.
I knew that I wanted the following sections below:
- Pie Graph that shows total employees
- Bar Graph for underpaid employees
- Drill down horizontal bar graph of "Department → Employees"
- Male / Female Filter
- Title
- Description
- Emoji's and links
Once I had those sections planned out, I just made the template for this dashboard using Tableau
After I had the dashboard planned out, I designed it out on Figma.
Figma Design
3) Executing - Figma / Tableau
This part was really easy. It basically came down to making the sheets that I need and putting it in the the right area. I mastered containers in Tableau, so it's really easy for me to build clean looking dashboards.
You can tell the difference between someone good at Tableau vs someone who just picked it up by looking at their "Item Hierarchy".
Item Hierarchy
Final Deliverable
Further Questions
After completing this project, there was a couple things that I thought would be interesting.
- Thought would be interesting to have the max salary per position.
- I could also add a button that shows more detailed information regarding underpaid umployees
- It would be interesting to find the percentage of employees that would leave if they are underpaid.
- I might actually work on this and try to find the p-value.