RFM is certainly not perfect. We identified a company purchasing our product at huge discounts (sales, etc.). As such, they show up as a good customer, but only from the standpoint of the top line.
The key is to check gross margin. This will show those who are the most profitable, and don’t simply purchase everything when it is heavily discounted. It’s good to know each type of customer, but the most profitable ones should be part of the analysis. This is not normally the case when using RFM.
hive> select member_key, rfm_recency, rfm_frequency, rfm_monetary, > rfm_recency*100 + rfm_frequency*10 + rfm_monetary as rfm_combined > from (select member_key, > ntile (5) over (order by last_purchase_date) as rfm_recency, > ntile (5) over (order by count_purchases) as rfm_frequency, > ntile (5) over (order by total_amount) as rfm_monetary > from (select member_key, > max(trxn_date) as last_purchase_date, > count(*) as count_purchases, > sum(ttl_amount_after_discount) as total_amount > from fact_transaction_summary > group by member_key) a > ) b > order by rfm_combined desc > limit 10; Query ID = hive_20161110154920_4402f3b1-ee9d-4922-ae87-50a85e7c2c48 Total jobs = 5 Launching Job 1 out of 5 Number of reduce tasks not specified. Estimated from input data size: 160 OK 1198618327 5 5 5 555 1205639576 5 5 5 555 1169604726 5 5 5 555 1135392826 5 5 5 555 1139089861 5 5 5 555 1111592362 5 5 5 555 1199168550 5 5 5 555 1188063454 5 5 5 555 1222207202 5 5 5 555 1197784897 5 5 5 555 Time taken: 2637.964 seconds, Fetched: 10 row(s) hive>