kemba-svm.exe
Japanese スペース English

SVM Software

Sample Dataset

スペース スペース スペース

We offer an implementation of the support vector machine (SVM) algorithm. We shall describe the usage of this software, comparing it with libsvm for clarity.

スペース    

Usage of kemba-svm.exe

 

Usage of libsvm

For using SVMs, you have to choose a kernel. Currently kemba-svm.exe supports the linear kernel, the RBF kernels, the polynomial kernels, and the partial distance kernels [1].

The software can also run with the pre-computed kernel matrix, but we do not describe about it here.

 

For using SVMs, you have to choose a kernel. Currently kemba-svm.exe supports the linear kernel, the RBF kernels, and the polynomial kernels.

The software can also run with the pre-computed kernel matrix, but we do not describe about it here.

We describe the usage of this software using a sample dataset. The dataset includes 47 positives and 25 negatives. Each example is represented by a 100-dimensional feature vector. Here we assume 10 examples are labeled for each class. The task is here to predict the labels of the remaining 52 examples.

True labels
正解
Given labels
問題
Feature vectors
特徴ベクトル

The class labels are given by the file k.y-true.csv, in which positive labels are +1, negative labels are -1, and unknown labels are 0. The feature vectors are given by the file k.X.csv containing a 72 by 100 matrix; a row in the matrix is the feature vector of an example. You can see the context of the two files using spread sheet software such as MS-Excel.

SVMs yield a score for an example. The software kemba-svm.exe outputs the scores in the file. An example is assigned to the positive class if the score is greater than a threshold; otherwise, to the negative class. Usually zero is used as the value of the threshold.

 

 

 

The software needs two input files: the one is for training data, the other is for the test data. The format of training and testing data file is:

<label> <index1>:<value1> <index2>:<value2>

The sample dataset transformed into the format is the following:

In the case of the linear kernel

Download the software package from the link at the top of this page, and uncompress the zip file in the folder My Documents. The new folder `kemba-svm1-ts' is then created and the executive file `kemba-svm.exe' is in that folder. Download the sample dataset and put the feature vectors `k.X.csv', the label file `k.y-true.csv', and the parameter file for linear kernel into the folder kemba-svm1-ts.

On the command prompt, type

cd "My Documents\kemba-svm1-ts"
      

you then move the folder kemba-svm1-ts. Type

kemba-svm.exe --cb2007a kemba-param.lin.data

Then, the score file k.yhat.csv is written. The prediction results are shown in the following figure:

線形カーネル



 

Download the software package from the webpage of libsvm, and uncompress the zip file in the folder My Documents. Put the training data file k.yX-tra.dat and the test data file k.yX-tst.dat into the folder libsvm-2.**\windows.

In the command prompt, type

cd "My Documents\libsvm-2.**\windows"

you then move the folder. Execute the following two commands:

svmtrain.exe -s 0 -t 0 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.res.dat

The prediction results of the 53 unknown data is written in th file k.res.dat.

In the case of RBF kernels

Put the parameter file for RBF kernels into the folder kemba-svm1-ts.

RBF kernels have a scale parameter gamma. This parameter file tries four cases: gamma=0.01, 0.05, 0.1, 0.5. Typing:

kemba-svm.exe --cb2007a kemba-param.rbf.data

runs the software, You then get the score file k.yhat.csv. This file contains a 4by72 matrix. The first row is the scores for gamma=0.01, The second row is for gamma=0.05. The third row is for gamma=0.1. The last row is for gamma=0.5. The prediction results are shown in the following figure:

RBFカーネル

A file named k.params-exp.txt is also written. The number of lines of the file is same as the number of the rows in the score file. The file k.params-exp.txt indicates the values of parameters for each row in the score file.

 

RBF kernels have a scale parameter gamma. This parameter file tries four cases: gamma=0.01, 0.05, 0.1, 0.5. Execute the following eight commands:

svmtrain.exe -s 0 -t 2 -g 0.01 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.g_0.01.res.dat
svmtrain.exe -s 0 -t 2 -g 0.05 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.g_0.05.res.dat
svmtrain.exe -s 0 -t 2 -g 0.1 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.g_0.1.res.dat
svmtrain.exe -s 0 -t 2 -g 0.5 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.g_0.5.res.dat

The prediction results of gamma=0.01,0.05,0.1,0.5 are written in the files `k.g_0.01.res.dat', `k.g_0.05.res.dat`, 'k.g_0.1.res.dat`, 'k.g_0.5.res.dat', respectively.

In the case of polynomial kernels

Put the parameter file for polynomial kernels into the folder kemba-svm1-ts.

Polynomial kernels have a degree parameter p. This parameter file tries three cases: p =2,3,4. Typing:

kemba-svm.exe --cb2007a kemba-param.poly.data

runs the software, You then get the score file k.yhat.csv. This file contains a 3by72 matrix. The first row is the scores for p=2, The second row is for p=3. The last row is for p=4. The prediction results are shown in the following figure:

多項式カーネル

A file named k.params-exp.txt is also written. The number of lines of the file is same as the number of the rows in the score file k.yhat.csv. The file k.params-exp.txt indicates the values of parameters for each row in the score file.

 

Polynomial kernels have a degree parameter p. This parameter file tries four cases: p =2,3,4.. Execute the following eight commands:

svmtrain.exe -s 0 -t 1 -d 2 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.d_2.res.dat
svmtrain.exe -s 0 -t 1 -d 3 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.d_3.res.dat
svmtrain.exe -s 0 -t 1 -d 4 k.yX-tra.dat k.model
svmpredict.exe k.yX-tst.dat k.model k.d_4.res.dat

The prediction results of p=2, 3, 4 are written in the files `k.d_2.res.dat', `k.d_3.res.dat', `k.d_4.res.dat', respectively.

In the case of partial distance kernels[1]

Put the parameter file for partial distance kernels into the folder kemba-svm1-ts.

Type:

kemba-svm.exe --cb2007a kemba-param.kernemb.data

Then, the score file k.yhat.csv is written. The prediction results are shown in the following figure:

partial distance カーネル

 

   

For trying all the kernels

Put the parameter file into the folder kemba-svm1-ts. The parameter file tries nine cases including the linear kernel, RBF kernels with gamma=0.01, 0.05, 0.1, 0.5, polynomial kernels with p=2, 3, 4, and the partial distance kernel [1]. Type

kemba-svm.exe --cb2007a kemba-param.all.data

Then, the score file k.yhat.csv is written. The file contains a 9by72 score matrix. The prediction results are shown in the following figure:

すべてのカーネル

A file named k.params-exp.txt is also written. The number of lines of the file is same as the number of the rows in the score file k.yhat.csv. The file k.params-exp.txt indicates the values of parameters for each row in the score file.

   
スペース    
The prediction results of kemba-svm.exe differ from the results of libsvm. It is because kemba-svm.exe performs kernel normalization in the default setting, but libsvm does not. Another reason is that the optimal bias of soft margin SVM is not unique, so different SVM software give different biases, which leads to different solutions.
スペース

Partial Distance Kernel

The partial distance kernel is designed as a kernel among noisy feature vectors. To reduce the influence of noise, Kato et al. define a new distance function called the partial distance which is robust to noise. To apply the distance values to kernel methods such as SVM, the distance matrix is converted to a kernel matrix using the principle of the maximum entropy. Fujibuchi and Kato experimentally show that the partial distance kernel is superior to the conventional kernels such as linear kernel, RBF kernel and polynomial kernel for analysis of microarray gene expression data.

スペース

References

[1] Tsuyoshi Kato, Wataru Fujibuchi, Kiyoshi Asai:
Kernels for Noisy Microarray Data,
CBRC Technical Report, AIST-02-J00001-8.Jun. 12, 2006.[pdf]
[2] Wataru Fujibuchi* and Tsuyoshi Kato*:
Classification of Heterogeneous Microarray Data by Maximum Entropy Kernel,
BMC Bioinformatics, accepted. (*These authors contributed equally.) [pdf][suppl]

 

 

 

July, 2007  Tsuyoshi Kato