Advancing big data clustering with fuzzy logic-based IMV-FCA and ensemble approach

Document Type : Research Paper

Authors

1 Research Scholar, Jawaharlal Nehru Technological University, Kakinada, Andhra Pradesh, India

2 Department of Computer Science and Engineering, Velagapudi Ramakrishna Siddhartha Engineering College, Kanuru, Vijayawada, Andhra Pradesh.

Abstract

The act of gathering, looking over, and analyzing a lot of data to find patterns, insights, and market trends that can
help businesses make more effective choices is known as big data analysis (BDA). Quick and effective access to this
data allows businesses to be flexible in developing strategies to hold onto their competitive edge. To analyze massive
amounts of data quickly through parallel processing, the structure of the Hadoop software employs the MapReduce
methodology. Computational solid resources are necessary for BDA, although they are not always available. Developing
new clustering techniques that could handle this kind of data processing became crucial. Therefore, in this research,
we presented a novel, effective fuzzy-based Improved Multiview Fuzzy C-Means Algorithm (IMV-FCA) to boost the
clustering strategy. To summarize, fuzzy-based IMV-FCA clustering presents the ensemble of the MobileNet V2 model,
and three-layered stacked Bidirectional LSTM (MVSBiLSTM) to increase computing speed and effectiveness. It also
presents a function that calculates the separation among the cluster center and the particular instance, to assist with
better clustering. By simulating shared memory space and parallelizing on the framework known as MapReduce on
the Hadoop cloud computing platform, the distributed database is utilized to improve the method’s effectiveness while
reducing its time complexities. The experimental investigation was conducted on existing approaches, and the proposed
approach was analyzed using three standard datasets. While differentiating from existing approaches, the presented
approach yields greater performances in terms of various metrics.

Keywords

Main Subjects


[1] R. M. Alguliyev, R. M. Aliguliyev, L. V. Sukhostat, Parallel batch k-means for big data clustering, Computers and
Industrial Engineering, 152 (2021), 107023. https://doi.org/10.1016/j.cie.2020.107023
[2] R. A. Ariyaluran Habeeb, F. Nasaruddin, A. Gani, M. A. Amanullah, I. Abaker Targio Hashem, E. Ahmed,
M. Imran, Clustering-based real-time anomaly detection-A breakthrough in big data technologies, Transactions on
Emerging Telecommunications Technologies, 33(8) (2022), e3647. https://doi.org/10.1002/ett.3647
[3] F. H. Awad, M. M. Hamad, L. Alzubaidi, Robust classification and detection of big medical data using advanced
parallel K-means clustering, YOLOv4, and logistic regression, Life, 13(3) (2023), 691. https://doi.org/10.3390/
life13030691
[4] O. Azeroual, A. Nikiforova, Apache spark and mllib-based intrusion detection system or how the big data technologies
can secure the data, Information, 13(2) (2022), 58. https://doi.org/10.3390/info13020058
[5] L. Deng, L.Wan, J. Guo, Research on security anomaly detection for big data platforms based on quantum optimization
clustering, Mathematical Problems in Engineering, 2(13) (2022). https://doi.org/10.1155/2022/4805035
[6] K. G. Flores, S. E. Garza, Density peaks clustering with gap-based automatic center detection, Knowledge-Based
Systems, 206 (2020), 106350. https://doi.org/10.1016/j.knosys.2020.106350
[7] S. Garg, K. Kaur, S. Batra, G. S. Aujla, G. Morgan, N. Kumar, R. Ranjan, En-ABC: An ensemble artificial bee
colony based anomaly detection scheme for cloud environment, Journal of Parallel and Distributed Computing,
135 (2020), 219-233. https://doi.org/10.1016/j.jpdc.2019.09.013
[8] M. Ianni, E. Masciari, G. M. Mazzeo, M. Mezzanzanica, C. Zaniolo, Fast and effective big data exploration by
clustering, Future Generation Computer Systems, 102 (2020), 84-94. https://doi.org/10.1016/j.future.2019.
07.077
[9] M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, J. Heming, K-means clustering algorithms: A comprehensive
review, variants analysis, and advances in the era of big data, Information Sciences, 622 (2023), 178-210. https:
//doi.org/10.1016/j.ins.2022.11.139
[10] F. Jemili, Intelligent intrusion detection based on fuzzy big data classification, Cluster Computing, 26(6) (2023),
3719-3736. https://doi.org/10.1007/s10586-022-03769-y
[11] T. Li, G. Kou, Y. Peng, S. Y. Philip, An integrated cluster detection, optimization, and interpretation approach
for financial data, IEEE Transactions on Cybernetics, 52(12) (2021), 13848-13861. https://doi.org/10.1109/
TCYB.2021.3109066
[12] W. Liu, Computer network confidential information security based on big data clustering algorithm, Wireless Communications
and Mobile Computing, 2(12) (2022). https://doi.org/10.1155/2022/4927504
[13] X. Liu, Q. Sun, W. Lu, C. Wu, H. Ding, Big-data-based intelligent spectrum sensing for heterogeneous spectrum
communications in 5G, IEEE Wireless Communications, 27(5) (2020), 67-73. https://doi.org/10.1109/MWC.
001.1900493
[14] M. Mahdavisharif, S. Jamali, R. Fotohi, Big data-aware intrusion detection system in communication networks:
A deep learning approach, Journal of Grid Computing, 19 (2021), 1-28. https://doi.org/10.1007/
s10723-021-09581-z
[15] G. Manogaran, P. M. Shakeel, S. Baskar, C. H. Hsu, S. N. Kadry, R. Sundarasekar, B. A. Muthu, FDM: Fuzzyoptimized
data management technique for improving big data analytics, IEEE Transactions on Fuzzy Systems,
29(1) (2020), 177-185. https://doi.org/10.1109/TFUZZ.2020.3016346
[16] V. D. Minh, T. T. Ngan, T. M. Tuan, V. T. Duong, N. T. Cuong, An improvement in integrating clustering method
and neural network to extract rules and application in diagnosis support, Iranian Journal of Fuzzy Systems, 19(5)
(2022), 147-165. https://doi.org/10.22111/IJFS.2022.7162
[17] S. H. Mousavi, M. Khansari, R. Rahmani, A fully scalable big data framework for Botnet detection based on network
traffic analysis, Information Sciences, 512 (2020), 629-640. https://doi.org/10.1016/j.ins.2019.10.018
[18] A. Nowak-Brzezi´nska, W.  Lazarz, Qualitative data clustering to detect outliers, Entropy, 23(7) (2021), 869. https:
//doi.org/10.3390/e23070869
[19] K. K. Pandey, D. Shukla, Stratified linear systematic sampling based clustering approach for detection of financial
risk group by mining of big data, International Journal of System Assurance Engineering and Management, (2022),
1-15. https://doi.org/10.1007/s13198-021-01424-0
[20] K. K. Pandey, D. Shukla, Cancer tissue detection using improved K-means initialization method for multidimensional
microarray big data clustering, Journal of Ambient Intelligence and Humanized Computing, 14(7)
(2023), 9277-9303. https://doi.org/10.1007/s12652-022-04428-1
[21] K. Rajendra Prasad, M. Mohammed, L. V. Narasimha Prasad, D. K. Anguraj, An efficient sampling-based visualization
technique for big data clustering with crisp partitions, Distributed and Parallel Databases, 39 (2021),
813-832. https://doi.org/10.1007/s10619-021-07324-3
[22] M. I. Razzak, M. Imran, G. Xu, Big data analytics for preventive medicine, Neural Computing and Applications,
32 (2020), 4417-4451. https://doi.org/10.1007/s00521-019-04095-y
[23] K. Tripathi, K. Sharma, M. Bala, A novel clustering method using enhanced grey wolf optimizer and MapReduce,
Big Data Research, 14 (2018), 93-100. https://doi.org/10.1016/j.bdr.2018.05.002
[24] C. Xia, Financial security risk detection in colleges and universities relying on big data clustering center scheduling
algorithm, Advances in Multimedia, 10(1) (2022), 1-12. https://doi.org/10.1155/2022/1361041
[25] Y. Zhang, DBSCAN clustering algorithm based on big data is applied in network information security detection,
Security and Communication Networks, 1(10) (2022). https://doi.org/10.1155/2022/9951609
[26] W. Zhong, N. Yu, C. Ai, Applying big data based deep learning system to intrusion detection, Big Data Mining and
Analytics, 3(3) (2020), 181-195. https://doi.org/10.26599/BDMA.2020.9020003
[27] H. Zhou, G. Sun, S. Fu, L. Wang, J. Hu, Y. Gao, Internet financial fraud detection based on a distributed big data
approach with node2vec, IEEE Access, 9 (2021), 43378-43386. https://doi.org/10.1109/ACCESS.2021.3062467
[28] Q. Zhu, L. Sun, Big data-driven anomaly detection for cellular networks, IEEE Access, 8 (2020), 31398-31408.
https://doi.org/10.1109/ACCESS.2020.2973214