Statistical Analysis with Excel-Chapter7

hajra Begum
3 min readMar 22, 2022

When I started studying Data Analytics and Data science I had to go back to my high school textbooks to learn what these charts meant and where I can use them. When we study math we are never aware of amazing things these charts can do in the world of Data.

I do not want you all to go back and redo things, so let me quickly teach you some basics and then do magic.

Histogram

Pareto

Boxes and Whiskers

Histograms:

Histograms are used for Univariate Analysis(analyzing data of one variable). They are also used for probability distributions. Check Skewness and Kurtosis. In simple terms, skewness is checking asymmetry and Kurtosis is peakedness of the distribution.

In Analytics it is very important to know the outliers of the data so we use these histograms to detect outliers.

I want to take data that I used in Data science projects to show you the usefulness of histograms.

The purpose of this challenge was to “tell a data story about a subset of the data science community represented in this survey, through a combination of both narrative text and data exploration.”

The original dataset (multiple_choice_responses.csv) contains the survey results provided by Kaggle. The survey results from 19717 participants are shown in 246 columns, representing survey questions. Not all questions are answered by each participant, and responses contain various data types.

In Excel-Data tab-Get data-From File-choose the csv file You downloaded from the Kaggle website and Transform Power query editor(Remove Errors, Remove duplicates, Remove Blank rows)-CLose and load.

The total data now is in 4164rows excluding headings.

Filter the Column 21 to replace the salary range as Levels.

Home tab-Find & Select-Replace

All salaries in the range from 0–9999 as 0

All Salaries in the range of 10000–19999 as 1

All Salaries in the range of 20000–29999 as 2

30000–39999 as 3

40000–49999 as 4

50000–59999 as 5

60000–69999 as 6

70000–79999 as 7

80000–89999 as 8

90000–99999 as 9

100000–124999 as 10

125000–149999 as 11

150000–199999 as 12

200000–249999 as 13

>250000 as 14

Filter to remove blanks and headings and select rest of the whole data in Column 21- Insert tab-histogram

Glimpse of first few rows
Histogram

Most of the respondents are in the salary class-0 and the data is positively skewed shows deviation from the normal distribution.In such kind of distributions Skewness can be used to obtain approximate probabilities and quantiles of distributions.

Pareto:

It is a combination of a column chart and a line chart.The column appear in the decreasing order of magnitude.The line represents cumulative percentage.

Pareto

Boxes and Whiskers:

The boxes and whiskers chart is a valuable for statistical analysis.Each data point is a box with an upward extension and a downward extension. The bottom line represents min value and the upward line represents the maximum value in series.In the box the bottom side of the box is 25 percentile and the middle line is 50 percentile /median value and the upside of the box is 75 percentile.The circles on the top represent Outliers.

Glimpse of first few rows of data.
Boxes and Whiskers

--

--

hajra Begum

Enthusiast of Data Science, Operations Research, and Mathematics/Love to cook.