0

Linear SVM is the newest extremely fast machine learning (data mining) algorithm for solving multiclass classification problems from ultra large data sets that implements an original proprietary version of a cutting plane algorithm for designing a linear support vector machine. LinearSVM is a linearly scalable routine meaning that it creates an SVM model in a CPU time which scales linearly with the size of the training data set. Our comparisons with other known SVM models clearly show its superior performance when high accuracy is required. We would highly appreciate if you may share LinearSVM performance on your data sets with us.


Features

•Efficiency in dealing with extra large data sets (say, several millions training data pairs),

•Solution of multiclass classification problems with any number of classes,

•Working with high dimensional data (thousands of features, attributes) in both sparse and dense format,

•No need for expensive computing resources (personal computer is a standard platform),

•Ideal for contemporary applications in digital advertisement, e-commerce, web page categorization, text classification, bioinformatics, proteomics, banking services and many other areas.


For an idea of LinearSVM performance check the two tables below which show results of LinearSVM on the original MNIST data set (Table 1) with 60,000 training examples and 10,000 examples for testing. The size of the feature space (dimensionality of input variables) is 778. Table 2 shows the result on the RCV1 binary classification problem with 677,399 data for training and 20,242 for testing. The size of the feature space is 47,236. A laptop computer with Intel Duo Core T9400 2.5Ghz and 3 Gb of RAM was used for these test.






Table 1: Simulation result on the original MNIST 10 classes data set

Table 2: Simulation result on the RCV1 binary classification problem

LinearSVM

Linear Support Vector Machine

by

Te-Ming Huang & Vojislav Kecman

v. 3.0, 2009

 

Please Click on Download button to get the code. LinearSVM is free for scientific use only and it must strictly not be used for commercial activity of any type. If you are unsure about the nature of your activities, please feel free to contact the authors. LinearSVM must not be further distributed.


If you may be interested in this software for your commercial applications, please contact contact@linearsvm.com for license and price information. You may also contact the authors.

Note that for C = 100,000 on RCV 1 Binary Data Set while using lower precisions  e = 0.03 and 0.04, LinearSVM needs only 8 and 7.5 seconds respectively, to complete the training and the accuracy changes insignificantly.


In order to show the capacity of LinearSVM we have created an artificial, 3 dimensional, data set having 5 million samples in four classes, normally distributed around the vertices of a cube at (1, 1, 1), (-1, -1, -1), (-1, 1, 1), and (1, -1, -1). First three classes have 1 million samples and the fourth one has 2 millions ones. Test has been performed on the same data set and stopping at 0.01 has been used.



Copyright Huang and Kecman © 2000 - 2009 All Rights Reserved

Table 3: Simulation result on the artificial data set with 5 million examples