We had already written an awk script to pull durations for a particularly slow web service call. We wanted to understand the distribution of the response times. Often, the average is high, but it is skewed by a number of extremely large samples. To do this, we look to the standard deviation of a set of data. The second awk call is what we scripted very quickly to calculate these statistical formulas.
-bash-4.2$ awk -f parse_eom_calls.awk log/eom-main.log | \ awk '{s+=$NF;t[++i]=$NF} END \ {\ for (i in t) { \ t1[i]=(t[i] - s/NR)^2;\ if (t[i] < s/NR) { kurtosis += 1 } };\ for (i in t) {\ d+=t1[i]\ };\ print "Average:",s/NR, "Median:",t[int(NR/2)],\ "Standard Deviation:",sqrt(d/(length(t1)-1)),\ "Coefficient of variation:",(sqrt(d/(length(t1)-1)))/(s/NR),\ "Kurtosis:",(kurtosis/NR)*100\ }' Average: 6086.13 Median: 2909 Standard Deviation: 30952.2 Coefficient of variation: 5.08569 -bash-4.2$
As you can see above, the distribution was very spread.
We also add a simple calculation of kurtosis. This is measure of where the tail lies on a distribution. In our case, almost 90% of the samples are below the mean. Again, a good indication of skewness.