Feature selection

The Feature Subset Selection ( FSS ), short feature selection, is an approach of machine learning, in which only a subset of the available features for a learning algorithm. FSS is necessary because it is sometimes technically impossible to involve all of the features, or because there is differentiation problems when a large number of features, but only a small number of data sets is available.

Filter approach

Compute a measure to distinguish between classes. Fair the weight of the features and choose the best n. In this feature subset of the learning algorithm is applied. Filters can either univariate (eg Euclidean distance, chi -square test ) or multivariate (eg correlation -based filter) calculate the intrinsic properties of the data.

Advantages:

Quickly computable
Scalable
Intuitively interpretable

Cons:

Redundant features ( Related Features will have similar weighting)
Ignored dependencies with the learning algorithm

Wrapper approach

Search the set of all possible feature subsets. For each subset of the training algorithm is applied. The search can be either deterministic ( eg forward selection, backward elimination ) or randomly (ex: simulated annealing, genetic algorithms).

Advantages:

Finding a feature subset that fits perfectly on the learning algorithm
Refers also combinations of features, and not only individually each feature
Removes redundant features
Easy to implement
Interacts with learning algorithm

Cons:

Very time consuming
Consists in heuristically the risk to find only local optima
Risk of overfitting the data
Depending on the learning algorithm

Embedded approach

The search for an optimal subset is connected directly to the learning algorithm.

Advantages:

Better operational performance and lower complexity
Dependencies between data points are modeled

Cons:

Selecting the subset depends strongly on the learning algorithm.

Examples:

Decision trees
Weighted naive Bayes
Selection of the subset using the weighting vector of SVM

Row (database)

328590