Open Access Open Access  Restricted Access Subscription or Fee Access

Training the SVM to Larger Dataset Applications Using the SVM Sampling Technique

G.M. Sangeetha, Prashanth Prashanth

Abstract


with increasing amounts of data being generated by businesses and researchers there is a need for fast, accurate and robust algorithms for data analysis.  Improvements in databases technology, computing performance and artificial intelligence have contributed to the development of intelligent data analysis. The primary aim of data mining is to discover patterns in the data that lead to better understanding of the data generating process and to useful predictions.  Examples of applications of data mining include detecting fraudulent credit card transactions, character recognition in automated zip code reading, and predicting compound activity in drug discovery.  Real-world data sets are often characterized by having large numbers of examples, e.g. billions of credit card transactions and potential ‘drug-like’ compounds; being highly unbalanced, e.g. most transactions are not fraudulent, most compounds are not active against a  given biological target; and,  being corrupted  by noise. The relationship between predictive variables, e.g. physical descriptors, and the target concept, e.g. compound activity, is often highly non-linear.  One recent technique that has been developed to address these issues is the support vector machine.  The support vector machine has been developed as robust tool for classification and regression in noisy, complex domains. The two key features of support vector machines are generalization theory, which leads to a principled way to choose an hypothesis; and, kernel functions, which introduce non-linearity in the hypothesis space without explicitly requiring a non-linear algorithm.  In this paper we introduce support vector machines cascade svm and randomized sampling technique highlight the advantages thereof over existing data analysis techniques, also are noted some important points for the data mining practitioner who wishes to use support vector machines.


Keywords


Support Vector Machine; SVM; Machine Learning; Multiprocessing; Scalability and Accurate Performance; Randomized Algorithm

Full Text:

PDF

References


S. Keerthi, S. Shevade, C. Bhattacharyya, and K. Murthy, “Improvements to Platt’s SMO algorithm for SVM classifier design,” Neural Computation, vol. 13, no. 3, pp. 637–649, 2001.

T. Joachims, “Training linear SVMs in linear time,” in Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2006, pp. 217–226.

P. Chen, R. Fan, and C. Lin, “A study on SMO-type decomposition methods for support vector machines,” Neural Networks, IEEE Transactions on, vol. 17, no. 4, pp. 893–908, 2006.

G. Loosli and S. Canu, “Comments on the core vector machines: Fast SVM training on very large data sets,” The Journal of Machine Learning Research, vol. 8, pp. 291–301, 2007.

H. Graf, E. Cosatto, L. Bottou, I. Dourdanovic, and V. Vapnik, “Parallel support vector machines: The Cascade SVM,” Advances in neural information processing systems, vol. 17, pp. 521–528, 2004.

M. Mavroforakis, M. Sdralis, and S. Theodoridis, “A geometric nearest point algorithm for the efficient solution of the SVM classification task,”Neural Networks, IEEE Transactions on, vol. 18, no. 5, pp. 1545–1549, 2007.

J. Balc´azar, Y. Dai, and O. Watanabe, “A random sampling technique for training support vector machines,” in Algorithmic Learning Theory. Springer, 2001, pp. 119–134.

D. Brugger, Parallel support vector machines. Universit¨atsbibliothek T¨ubingen, 2006.

A. Frank and A. Asuncion, “UCI machine learning repository,” 2010. [Online]. Available: http://archive.ics.uci.edu/ml.


Refbacks

  • There are currently no refbacks.