{"id":6338,"date":"2017-06-15T20:06:14","date_gmt":"2017-06-16T01:06:14","guid":{"rendered":"http:\/\/appcrawler.com\/wordpress\/?p=6338"},"modified":"2017-06-15T20:06:14","modified_gmt":"2017-06-16T01:06:14","slug":"awk-statistical-functions","status":"publish","type":"post","link":"http:\/\/appcrawler.com\/wordpress\/2017\/06\/15\/awk-statistical-functions\/","title":{"rendered":"awk statistical functions"},"content":{"rendered":"<p>We had already written an awk script to pull durations for a particularly slow web service call.  We wanted to understand the distribution of the response times.  Often, the average is high, but it is skewed by a number of extremely large samples.  To do this, we look to the standard deviation of a set of data.  The second awk call is what we scripted very quickly to calculate these statistical formulas.<\/p>\n<pre>\r\n-bash-4.2$ awk -f parse_eom_calls.awk log\/eom-main.log | \\\r\n           awk '{s+=$NF;t[++i]=$NF} END \\\r\n                {\\\r\n                  for (i in t) { \\\r\n                     t1[i]=(t[i] - s\/NR)^2;\\\r\n                     if (t[i] < s\/NR) {\r\n                       kurtosis += 1\r\n                     }\r\n                  };\\\r\n                  for (i in t) {\\\r\n                    d+=t1[i]\\\r\n                  };\\\r\n                  print \"Average:\",s\/NR,\r\n                        \"Median:\",t[int(NR\/2)],\\\r\n                        \"Standard Deviation:\",sqrt(d\/(length(t1)-1)),\\\r\n                        \"Coefficient of variation:\",(sqrt(d\/(length(t1)-1)))\/(s\/NR),\\\r\n                        \"Kurtosis:\",(kurtosis\/NR)*100\\\r\n                }'\r\nAverage: 6086.13 Median: 2909 Standard Deviation: 30952.2 Coefficient of variation: 5.08569\r\n-bash-4.2$\r\n<\/pre>\n<p>As you can see above, the distribution was very spread.<\/p>\n<p>We also add a simple calculation of <a href=\/wordpress\/2009\/04\/15\/does-correlation-help-at-all\/ target=_blank>kurtosis<\/a>.  This is  measure of where the tail lies on a distribution.  In our case, almost 90% of the samples are below the mean.  Again, a good indication of skewness.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We had already written an awk script to pull durations for a particularly slow web service call. We wanted to understand the distribution of the response times. Often, the average is high, but it is skewed by a number of&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"more-link\" href=\"http:\/\/appcrawler.com\/wordpress\/2017\/06\/15\/awk-statistical-functions\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_mi_skip_tracking":false,"footnotes":""},"categories":[14,16],"tags":[],"_links":{"self":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/6338"}],"collection":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/comments?post=6338"}],"version-history":[{"count":12,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/6338\/revisions"}],"predecessor-version":[{"id":6351,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/posts\/6338\/revisions\/6351"}],"wp:attachment":[{"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/media?parent=6338"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/categories?post=6338"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/appcrawler.com\/wordpress\/wp-json\/wp\/v2\/tags?post=6338"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}