Data Mining Techniques
Data Mining is an important analytic process designed to explore data. Much like the real-life process of mining minerals, gold or other precious metals from underneath the earth. The most important task in data mining is to extract non-trivial knowledge from large amounts of data.
Extracting important knowledge from a mass of data can be crucial, sometimes essential, for the next phase in the analysis: the modeling. Many assumptions and hypotheses will be drawn from your models, so it’s incredibly important to spend appropriate time “massaging†the data ie preparing the data, extracting important information before moving forward with the modeling.
Although the definition of data mining as described above may seems to be clear, you may discover that many mistakenly relate data mining to tasks such as generating reports, charts, histograms, issuing SQL queries to a database, and generally visualizing and generating multidimensional shapes of a relational table.
For example: data mining is not about extracting a group of people from a specific city in our database; the task of data mining in this case will be to find groups of people with similar preferences or interests in our data. Similarly, data mining is not about creating a graph of, for example, the number of people that have cancer against power voltage—data mining’s task in this case could be something like: Is the chance of getting cancer higher if you live near a power-line?
The tasks of data mining are twofold: create predictive power—using features to predict unknown or future values of the same or other feature—and create a descriptive power—find interesting, human-interpretable patterns that describe the data.
There are four main data mining techniques:
- Regression (predictive)
- Association Rule Discovery (descriptive)
- Classification (predictive)
- Clustering (descriptive)