Category: Statistics

Market Basket Analysis with python

Very elegant example at http://blog.derekfarren.com/2015/02/how-to-implement-large-scale-market.html. What is below is just a complete script with what the link above provides, along with example of our output. Our file was formatted slightly differently, so we had to change the split command. We…

What is the most likely dice roll?

Probably common sense, but this statistically supports the most common roll of the dice. Of course, this assumes physical normalcy of the die, not loaded, etc. 🙂 [root@cmhlcanlyodb01 ~]# cat rand.py import random d = dict() for i in range(1,100001):…

Birthday paradox in python

I love things like this. Earlier, we produced a working example of the Monte Hall problem. In this post, we show something similar for the birthday paradox… c:\Python27>type c:\Users\showard\bday.py from random import randint cnt = 0 for k in range(2000):…

Association rules in java

Assocation rules are a concept in which relationships between different elements of a common set can be established. For example, a study may be undertaken to determine the impact of one externally employed parent on childrens GPA in a household,…

Monty Hall simulation in python

The Monty Hall problem, or paradox as it is sometimes known, has always intrigued me, so much so that I wrote a python based simulator to prove it to be true or false. The crux of it, and as its…

Statistical functions with java

Below is a simple set of classes for performing your own statistical analysis. As noted, they are simple (no multiple regression analysis, etc.) class myStats { double standardDeviation (double[] array) { double[] arr = new double[array.length]; double av = average(array);…

Multicollinearity

The predictors should not correlate. In the stepwise selection one variable might take the prediction of the another variable into the model and the second variable will not be taken in to the model. The variables might get also inconsistent…

Does correlation help at all?

While studying how we can use statistics to better understand the performance of our applications, I came upon the concept of kurtosis. What this essentially means is that any given distribution is not normal if its kurtosis is very high…OK,…