Open Access Open Access  Restricted Access Subscription or Fee Access

Comparative Study on Swarm Search Feature Selection for Big Data Stream Mining

S. Meera, Dr. B. Rosiline Jeetha

Abstract


In the modern world there is huge development in the field of networking technology which handles huge data at a time. This data can be structured, semi structured or unstructured. To perform efficient mining of valuable information from such type of data the big data technology is gaining importance nowadays. Data mining application is been used in public and private sectors of industry because of its advantage over conventional networking technology to analyze large real time data. Data mining mainly relies on 3 V’s namely, Volume, Varity and Velocity of processing data. Volume refers to the huge amount of data it collects, Velocity refers to the speed at which it process the data and Variety defines that multi-dimensional data which can be numbers, dates, strings, geospatial data, 3D data, audio files, video files, social files, etc. These data which is stored in big data will be from different source at different rate and of different type; hence it will not be synchronized. This is one of the biggest challenges in working with big data. Second challenge is related to mining the valuable and relevant information from such data adhering to 3rd V i.e. Velocity. Speed is highly important as it is associated with cost of processing.

 On the other hand, mining through the high dimensional data the search space from which an optimal feature subset is determined and it is enhanced in size, guiding to a difficult stipulate in computation. With respect to handle the troubles, the research work is generally based on the high-dimensionality and streaming structure of data feeds in big data, a new inconsequential feature selection methodology that can be used to identify the feature selection methods in the big data. Some of the research work illustrates the different kinds of optimization methods for data stream mining would lead to tremendous changes in big data. This research work is focused on discussing various research methods that focus on finding the efficient feature selection methods which is used to avoid main challenges and produce optimal solutions. The previous methods are described with their advantages and disadvantages, consequently that the additional research works can be focused more. The tentative experiments were on the entire research works in Mat lab simulation surroundings and it is differentiated with everyone to identify the good methodologies beneath the different performance measures.


Keywords


Big Data, Feature Selection, Particle Swarm Optimization, Classification

Full Text:

PDF

References


Alelyani, S., Zhao, Z and Liu, H., 2011. “A dilemma in assessing stability of feature selection algorithms”, in IEEE 13th International Conference on High Performance Computing and Communications (HPCC), 701–707.

Minku, L.L., White A.P and X. Yao, 2010. “The impact of Diversity on online ensemble learning in the presence of concept drift”, 22(5):730–742.

Fong and Simon, 2014. “A Scalable data stream mining methodology: stream-based holistic analytics and reasoning in parallel”, Computational and Business Intelligence (ISCBI), 2014 2nd International Symposium.

Ping-Feng Pai and Tai-Chi Chen, 2009. Rough set theory with discriminant analysis in analyzing electricity loads", Expert Systems with Applications 36:8799–880.

Guyon, I and Elisseeff, A., 2003. “An Introduction to Variable and Feature Selection”, Journal of Machine Learning Research, 3: 1157- 1182.

Chakraborty and Basabi, 2014. “Rough fuzzy consistency measure with evolutionary algorithm for attribute reduction”, 2014 IEEE International Conference on Systems, Man, and Cybernetics (SMC).

Bishop, C.M., 2006. Pattern Recognition and Machine Learning, Springer.

Tani, Fauzia Yasmeen, Dewan Md Farid and Mohammad Zahidur Rahman, 2012. “Ensemble of Decision Tree Classifiers for Mining Web Data Streams”, International Journal of Applied Information Systems, 30-36 .

Akioka and Sayaka, 2013. “Task Graphs of Stream Mining Algorithms”.

Yu, Kui, 2014. “Towards scalable and accurate online feature selection for big data”, 2014 IEEE International Conference on Data Mining.

Tekin, Cem, Luca Canzian and Mihaela Van Der Schaar, 2014. “Context-adaptive big data stream mining”, Communication, Control, and Computing (Allerton).

Ruta and Dymitr, 2014, “Robust method of sparse feature selection for multi-label classification with Naive Bayes”, Computer Science and Information Systems (FedCSIS).

Vu and Anh Thu, 2014. “Distributed adaptive model rules for mining big data streams”, Big Data (Big Data).

Fong and Simon, 2014. "A Scalable data stream mining methodology: stream-based holistic analytics and reasoning in parallel”, Computational and Business Intelligence (ISCBI).

Shivani Harde and Vaishali Sahare, 2015. “ACO Swarm Search Feature Selection for Data stream Mining in Big Data”, International Journal of Innovative Research in Computer and Communication Engineering, 3(12).

Wang and Chanpaul, J., 2015. “A novel initialization method for particle swarm optimization-based FCM in big biomedical data”.

Fong, Simon, Raymond Wong and Athanasios V. Vasilakos, 2016. “Accelerated PSO swarm search feature selection for data stream mining big data”, IEEE Transactions on Services Computing, 33-45.




DOI: http://dx.doi.org/10.36039/AA012017003.

Refbacks

  • There are currently no refbacks.


Creative Commons License
This work is licensed under a Creative Commons Attribution 3.0 License.