Statistical Analysis with Excel-Chapter8

hajra Begum
5 min readMar 23, 2022

When you are doing Analysis in Statistics you have two main categories :

a.Graphical Analysis(Data Visualization)

b.Descriptive and Inferential Statistics

I already went through all the visualizations possible through the chart types, but as an analyst that is only giving you a bigger picture of the data and is not going to give deeper insights into the data for decision making. So we have more tools in statistics. Therefore we will move to study Descriptive statistics.

In descriptive Statistics we have

a.Measures of Frequency(count and Frequency)

b.Measures of Central tendency(Mean,Median,Mode)

c.Measures of Dispersion(Variance, Standard deviation, and Range)

d.Measures of position. (percentiles)

We already practiced measures of Frequency when formatting Pivot tables for Charts.

So let us study the Measures of Central Tendency:

Mean: The statistical term for Average is mean.

We all calculate the average many times but why it is important to Data Analysts?

We practically do testing on Sample and get results to estimate the effects of the procedure on the population that produced the sample. The mean of the sample is the best estimate of the population mean.

I don’t need to tell you the formula but I’m going to revise it anyway.

Mean=Sum of numbers/Amount of numbers we added up

https://www.onlinemathlearning.com/population-mean.html

Population measures are called Parameters.

Sample measures are called Statistics.

population mean is (pronounced ‘Myou”)

Sample mean(pronounced “X bar”)

download the Equipment losses and personnel losses data and save on your desktop.

Data tab-Get data-From file-choose the Russia_equipemnt_losses file and load in Excel sheet.

We will go through

Average

AverageA

Averageif

AverageIFS

Trimmean

Geometric mean

Harmonic mean

I cannot just go and calculate average,it is important to check if our data is cleaned and is in the form we need.

Firstly I see that Day 1 is not listed in the table so add a row and put values for all columns as 0.You will understand why I am doing this in the next step.

Secondly write N/A in all rows of special equipment column and 0 for row 21,10 on row 22, 0 on row 23, 1 on row 24, 0 on row 25,1on row 26,1 on row 27 and 2 on row 28.

Finally I observed that the values are aggregated over the 27 days but I need daily losses for equipments.

So now I need to transform my table to pivot table.

Then choose Rows(day),Values(Sum of aircraft,sum of helicopter,sum of tank) (Right click and choose value filed settings as show values as difference from previous.

Pivot table fields
Value Field settings

Do the same for helicopter and tank ,value field settings.

Because I am taking difference I need the Day 1 values otherwise I wont have Day 2 values after the difference.

After the settings the table look

Formulas tab-more functions-statistical-Average

Average

Do the same for three columns.

Results

** Average ignores the cells with text/true/false values***

Quickly check how AverageA works.In the table itself as you already did the changes we calculate average of special equipment and no need for pivot table.

AverageA function
Result

AverageIf:

Conditional averages are very helpful in statistics.I will now take

Russia _personnel losses table.

follow same steps

Add row 1 in table with 0 value in all columns.

Convert the table into Pivot table.

Choose pivot fields as

Pivot table fields

then value field setting of Values (sum of personnel).

pivot table look

Formulas tab-more functions-statistical-AverageIF

averageif

Getting the average of personnel lossess only if the lossess are more than 1000.

Averageifs:

Getting the average of personnel lossess based on two criteria ≥1000 and day≤15.

AvergaeIFS
Result

TRIMMean:

In any data we need to look for statistical Outliers.Outlier is an extreme value in a set.These outliers can get in the way of conclusions based on the averaging of the data.

It is often recommended to eliminate them before we calculate the mean.

*** But in advanced Analytics of Data science we study the effects of Outliers before removing***

But in here we use Trimmean for trimming the mean ,to be able to use when needed.

I removed only 0.2% of outliers from the mean calculation.

Result

Median:

Why median when we already use mean, because mean is sensitive to extreme values(outliers).

The mean value will change drastically when we have outliers and at that time mean would not be a good statistic to measure.

Median is the middle value in the group of numbers after arranging them in descending order. It is also the 50th percentile value, meaning half of the values fall below and another half of the values fall above the median value.

we will use the same russia_personnel_lossess table.

Median
Result

Mode:

Mode is the value that occurs most frequently in the group .

Mode
Result

Use Mode. multi if you have two/more modes in the array.

--

--

hajra Begum

Enthusiast of Data Science, Operations Research, and Mathematics/Love to cook.