Data Mining explained for the layman

Data mining is a buzzword. What it describes is really a form of Machine learning and there is no definitive border between one of these many overlapping disciplines and another, so the takeaway is not to get too worried about it and ignore anyone who does unless they are your boss, or paying very well.

The reason we need data mining is that our rational approach is based on variables or indicators that naturally exist in the data we use every day. When these no longer give us value we have reached the end of the road with traditional methods. Data mining and machine learning approaches have fewer or occasionally no preconceived notions and simply look for patterns that could be useful.

A simple example is simply looking for recognisable patterns and then correlating between them. A cancer care company might notice that every time this value is 3 the likelihood of a cancer diagnosis is noticeably higher. It might be very difficult to see why this might be and more research might be called for, or it might be that a logical explanation can easily be found and therefore a new and easier method of diagnosis has been found.
Often the patterns are not useful and always we are urged to look for logical explanations of why these patterns are useful before taking risks with them.

A common use in the business world is to track patterns in online usage by customers to predict when a customer might cancel their account (Churn). In reality, were this money spent on taking care of customers such as answering  phones and being helpful, the results are likely to be much higher but not as much fun.

More recently machine learning tools have advanced quite considerably and a clearer division is emerging between data mining and machine learning as distinct approaches. Both Data mining and Machine learning are often spoken about in discussions about AI .   and indeed there are areas of overlap

The term is frequently applied to any form of large-scale data or information processing (collectionextractionwarehousinganalysis, and statistics) as well as any application of computer decision support system, including artificial intelligence, machine learning, and business intelligence. The book Data mining: Practical machine learning tools and techniques with Java[7] (which covers mostly machine learning material) was originally to be named just Practical machine learning, and the term data mining was only added for marketing reasons.[8] Often the more general terms (large scaledata analysis and analytics – or, when referring to actual methods, artificial intelligence and machine learning – are more appropriate. wilipaedia

The “Explained series” is planned to build into a trustworthy collection of explanations and commentaries that can be trusted to tell the story straight without any bias and attempt to make the subjects accessible to the layman. The latter is not always easy as some of these terms refer to genuinely complex subject matter, while others are simply too vague to pin down (there’s another word for that). There is also a limit to how far I can go in explaining every term when there are a lot of them, so I have to sometimes rely on your initiative to right click the offending word and look it up.
If you want an answer on something and you can’t find it easily, please use the comments section to just ask and I will appreciate not having to research the next topic.

About the author