In the field of Machine Learning one considers the important question of how to make machines able to learn. Learning in this context is understood as inductive inference, where one observes examples that represent incomplete information about some statistical phenomenon.
In unsupervised learning one typically tries to uncover hidden regularities (e.g. clusters) or to detect anomalies in the data (for instance some unusual machine function or a network intrusion). In supervised learning, there is a label associated with each example. It is supposed to be the answer to a question about the example. If the label is discrete, then the task is called classification problem – otherwise, for realvalued labels we speak of a regression problem. Based on these examples (including the labels), one is particularly interested to predict the answer for other cases before they are explicitly observed. Hence, learning is not only a question of remembering but also of generalization to unseen cases.
In the field of machine learning exist a great bundel of learning algorithms, the methods that are frequently using when solving data analysis tasks (usually classification) are k-Nearest Neighbor Classification, Linear Discriminant Analysis, Decision Trees, Neural Networks, Support Vector Machines and Boosting. The first four methods are traditional techniques that have been widely used in the past and work reasonably well when analyzing low dimensional data sets with not too few labeled training examples. The last two methods (Support Vector Machines & Boosting) that have received a lot of attention in the Machine Learning community recently. They are able to solve high-dimensional problems with very few examples (e.g. fifty) quite accurately and also work efficiently when examples are abundant (for instance several hundred thousands of examples).