Shogun (Toolbox)

Shogun is an open -source toolbox from the field of machine learning. It can be used to solve regression and classification problems, and learn, among other things Hidden Markov Models.

Application focus

The focus of this toolbox is clearly on so-called core methods (see kernel ( Machine Learning ) ), with the application focus bioinformatics. A series of cores was on sequences ( so-called string kernels ) and implemented especially designed for speed for application to large data sets. Thus, with the toolbox problems with string kernels to solve very large amounts of data ( up to 10 million ).

In particular, Shogun offers generic interfaces to many different implementations of Support Vector Machines ( SVMs ), such as SVMlight and libSVM. This will enable all SVMs to use the same core implementations, and facilitates the addition of new core -based learning methods. In addition to the following standard kernels (linear, polynomial, Gaussian and sigmoid kernel (see kernel ( Machine Learning ))) are present in Shogun efficient implementations published recently for string kernels, such as the Locality Improved, fishermen, TOP, spectrum, Weighted Degree kernel (with shifts ). For the latter, the efficient LINADD optimizations were implemented.

Special

Shogun also makes it possible to work with their own pre- cores. One of the main features of this toolbox is the so-called combined kernel (English combined kernel), which is composed of a weighted linear combination of sub - cores. The sub - cores do not necessarily work on the same input space, but on different domains. Shogun is an optimal sub - kernel weighting, that is, learning through the multiple kernel learning algorithm. In addition to the SVM 2-class classification and regression problems in Shogun is also a series of linear methods implemented. Examples are the discriminant (English Linear Discriminant Analysis (LDA ) ), Linear Programming Machine ( LPM ), ( Kernel) perceptrons and hidden Markov models. Shogun can handle a broad spectrum of data. Not only are fully staffed input matrices, but also sparse, and strings of type integer / floating point (single or double precision) can each be possible. Furthermore, chains can be attached to the inputs of preprocessors, so that the entries on-the -fly can be weiterverarbeit of the learning algorithms.

Interfaces

Shogun is implemented in C and provides interfaces to Matlab ™, R, Octave, and Python. These interfaces allow interactive experiment with the learning algorithms ( see Figure 1 for the Python interface ), but also batch - script processing on compute clusters.

Kernel trick Feature space Discriminant function analysis R (programming language) GNU Octave Corinna Cortes Journal of Machine Learning Research

727740