QPMMCOA and Bayesian Fuzzy Clustering: A Novel Approaches For Optimizing Queries in Big Data

Document Type : Research Paper

Authors

1 Koneru Lakshmaiah Educational Foundation, Vaddeswaram, Andhra Pradesh, India

2 Koneru Lakshmaiah Education Foundation, Vaddeswaram, Andhra Pradesh, India

Abstract

The explosion of data in the last ten years has led to a substantial focus on big data (BD) in information area. The philosophical applications of "query optimization (QO)" are crucial in BD environments' data retrieval processes. Several distributed data processing platforms in cloud were developed to provide BD query optimization services that are both affordable and effective. Nevertheless, due to a lack of consideration for energy-related concerns and query characteristics, most solutions resulted in higher "energy consumption (EC)" and lower accuracy. We introduced an innovative deep-learning approach to arrange big data to overcome the issue. This work presents an effective query optimization that uses the Quantum parallel multi-layer Monte Carlo optimization method (QPMMCOA) optimizer and a load balancer based on Bayesian fuzzy clustering to address the problems associated with query optimization process. There are two phases to the suggested technique: (1) Big data arrangement and (2) Query Optimization. The first step arranges BD using preprocessing, feature extraction, feature selection, and deep learning-based BD arrangement. The improved Deep Residual Shrinkage Network (IDRSN) algorithm is used for the BD arrangement. The essential features are selected using the Chaotic Vertex Search algorithm (CVSA). During the second phase, a Bayesian fuzzy clustering-based load balancer is used with the QPMMCOA optimizer to improve overall query processing performance and ignore energy-efficient query plans. At last, the process of evaluating similarity is carried out. The experimental results demonstrated that the method performed better than other existing algorithms.

Keywords

Main Subjects


[1] H. B. Abdalla, A. M. Ahmed, M. A. Al Sibahee, Optimization-driven mapreduce framework for indexing and retrieval
of big data, KSII Transactions on Internet and Information Systems (TIIS), 14(5) (2020), 1886-1908. http://doi.
org/10.3837/tiis.2020.05.002
[2] L. Abualigah, A. H. Gandomi, M. A. Elaziz, H. A. Hamad, M. Omari, M. Alshinwan, A. M. Khasawneh, Advances
in meta-heuristic optimization algorithms in big data text clustering, Electronics, 10(2) (2021), 101. https://doi.org/10.3390/electronics10020101
[3] R. Akram, N. Ayub, I. Khan, F. R. Albogamy, G. Rukh, S. Khan, K. Rizwan, Towards big data electricity theft
detection based on improved rusboost classifiers in smart grid, Energies, 14(23) (2021), 8029. https://doi.org/
10.3390/en14238029
[4] M. Q. Bashabsheh, L. Abualigah, M. Alshinwan, Big data analysis using hybrid meta-heuristic optimization
algorithm and MapReduce framework, in integrating meta-heuristics and machine learning for real-world optimization
problems, Cham: Springer International Publishing, (2022), 181-223. https://doi.org/10.1007/
978-3-030-99079-4_8
[5] J. Bater, Y. Park, X. He, X. Wang, J. Rogers, Saqe: Practical privacy-preserving approximate query processing for
data federations, Proceedings of the VLDB Endowment, 13(12) (2020), 2691-2705. https://doi.org/10.14778/
3407790.3407854
[6] R. Chi, H. Li, D. Shen, Z. Hou, B. Huang, Enhanced P-type control: Indirect adaptive learning from set-point updates,
IEEE Transactions on Automatic Control, 68(3) (2022), 1600-1613. https://doi.org/10.1109/TAC.2022.3154347
[7] D. Choi, J. Wee, S. Song, H. Lee, J. Lim, K. Bok, J. Yoo, K-NN query optimization for high-dimensional index
using machine learning, Electronics, 12(11) (2023), 2375. https://doi.org/10.3390/electronics12112375
[8] Q. T. Doan, A. S. M. Kayes, W. Rahayu, K. Nguyen, A framework for IoT streaming data indexing and query
optimization, IEEE Sensors Journal, 22(14) (2022), 14436-14447. https://doi.org/10.1109/JSEN.2022.3149901
[9] K. Dubey, A. Kumar, R. Agrawal, An efficient ACO-PSO-based framework for data classification and preprocessing
in big data, Evolutionary Intelligence, 14 (2021), 909-922. https://doi.org/10.1007/s12065-020-00477-7
[10] W. Ge, X. Li, C. Yuan, Y. Huang, Correlation-aware partitioning for skewed range query optimization, World Wide
Web, 22(1) (2019), 125-151. https://doi.org/10.1007/s11280-018-0547-4
[11] D. Geng, C. Zhang, C. Xia, X. Xia, Q. Liu, X. Fu, Big data-based improved data acquisition and storage system
for designing industrial data platform, IEEE Access, 7 (2019), 44574-44582. https://doi.org/10.1109/ACCESS.
2019.2909060
[12] S. B. Goyal, P. Bedi, A. S. Rajawat, R. N. Shawand A. Ghosh, Multi-objective fuzzy-swarm optimizer for data partitioning, In Advanced Computing and Intelligent Technologies: Proceedings of ICACIT 2021, Springer Singapore,
1 (2022), 307-318. https://doi.org/10.1007/978-981-16-2164-2_25
[13] Y. Guo, Z. Shao, Cymo: A storage model with query-aware indexing for spatio-temporal big data, In 2022 IEEE
42nd International Conference on Distributed Computing Systems (ICDCS), (2022), 122-132. https://doi.org/
10.1109/ICDCS54860.2022.00021
[14] H. Hu, J. Liu, X. Zhang, M. Fang, An effective and adaptable K-means algorithm for big data cluster analysis,
Pattern Recognition, 139 (2023), 109404. https://doi.org/10.1016/j.patcog.2023.109404
[15] M. Jagdish, N. Anand, K. Gaurav, S. Baseer, A. Alqahtani, V. Saravanan, Multihoming big data network using
blockchain-based query optimization scheme, Wireless Communications and Mobile Computing, 1 (2022), 1-12.
https://doi.org/10.1155/2022/7768169
[16] N. I. N. G. Jing, Neural network-based pattern recognition in the framework of edge computing, Science and Technology, 27(1) (2024), 106-119.
[17] H. Kour, M. K. Gupta, Hybrid evolutionary intelligent network for sentiment analysis using twitter data during
COVID-19 pandemic, Expert Systems, 41(3) (2024), e13489. https://doi.org/10.1111/exsy.13489
[18] D. Kumar, V. K. Jha, An improved query optimization process in big data using ACO-GA algorithm and
HDFS map reduce technique, Distributed and Parallel Databases, 39 (2021), 79-96. https://doi.org/10.1007/
s10619-020-07285-z
[19] D. Kumar, V. K. Jha, An efficient query optimization technique in big data using σ-ANFIS load balancer and
CaM-BW optimizer, The Journal of Supercomputing, 77(11) (2021), 13018-13045. https://doi.org/10.1007/
s11227-021-03793-6
[20] R. Kumar, P. Kumar, Y. Kumar, Integrating big data driven sentiments polarity and ABC-optimized LSTM for
time series forecasting, Multimedia Tools and Applications, 81(24) (2022), 34595-34614. https://doi.org/10.
1007/s11042-021-11029-1
[21] V. N. Kumar, A. Kumar P. S., An efficient and scalable SPARQL query processing framework for big data using
MapReduce and hybrid optimum load balancing, Data and Knowledge Engineering, 148(1) (2023), 102239. https:
//doi.org/10.1016/j.datak.2023.102239
[22] D. Li, L. Deng, Z. Cai, Statistical analysis of tourist flow in tourist spots based on big data platform and
DA-HKRVM algorithms, Personal and Ubiquitous Computing, 24 (2020), 87-101. https://doi.org/10.1007/
s00779-019-01341-x
[23] X. Li, H. Liu, W. Wang, Y. Zheng, H. Lv, Z. Lv, Big data analysis of the internet of things in the digital
twins of smart city based on deep learning, Future Generation Computer Systems, 128 (2022), 167-177. https:
//doi.org/10.1016/j.future.2021.10.006
[24] D. Mahajan, C. Blakeney, Z. Zong, Improving the energy efficiency of relational and NoSQL databases via query
optimizations, Sustainable Computing: Informatics and Systems, 22(1) (2019), 120-133. https://doi.org/10.
1016/j.suscom.2019.01.017
[25] G. Manogaran, P. M. Shakeel, S. Baskar, C. H. Hsu, S. N. Kadry, R. Sundarasekar, B. A. Muthu, FDM: Fuzzyoptimized
data management technique for improving big data analytics, IEEE Transactions on Fuzzy Systems, 29(1)
(2020), 177-185. https://doi.org/10.1109/TFUZZ.2020.3016346
[26] S. Meera, C. Sundar, A hybrid metaheuristic approach for efficient feature selection methods in big data,
Journal of Ambient Intelligence and Humanized Computing, 12 (2021), 3743-3751. https://doi.org/10.1007/
s12652-019-01656-w
[27] P. Michiardi, D. Carra, S. Migliorini, Cache-based multi-query optimization for data-intensive scalable computing
frameworks, Information Systems Frontiers, 23(1) (2021), 35-51. https://doi.org/10.1007/s10796-020-09995-2
[28] S. Migliorini, A. Belussi, E. Quintarelli, D. Carra, CoPart: A context-based partitioning technique for big data,
Journal of Big Data, 8 (2021), 1-28. https://doi.org/10.1186/s40537-021-00410-4
[29] A. Murugan, D. Gobinath, S. G. Kumar, B. Muruganantham, S. Velusamy, A time efficient and accurate retrieval
of range aggregate queries using fuzzy clustering means (FCM) approach, International Journal of Electrical and
Computer Engineering, 10(1) (2020), 415. https://doi.org/10.11591/ijece.v10i1.pp415-420
[30] N. Orensa, A design framework for efficient distributed analytics on structured big data, Doctoral Dissertation,
University of Saskatchewan, 2021.
[31] N. G. Praveena, S. S. Nath, A fuzzy based efficient and blockchain oriented secured routing in vehicular Ad-Hoc
networks, Iranian Journal of Fuzzy Systems, 21(6) (2024), 15-31.
[32] M. M. Rahman, S. Islam, M. Kamruzzaman, Z. H. Joy, Advanced query optimization in SQL databases for real-time
big data analytics, Academic Journal on Business Administration, Innovation and Sustainability, 4(3) (2024), 1-14.
https://doi.org/10.1109/access.2022.3141589
[33] V. Ravuri, S. Vasundra, Moth-flame optimization-bat optimization: Map-reduce framework for big data clustering
using the Moth-flame bat optimization and sparse fuzzy C-means, Big Data, 8(3) (2020), 203-217. https://doi.
org/10.1089/big.2019.0125
[34] R. C. Roman, R. E. Precup, E. M. Petriu, A. I. Borlea, Hybrid data-driven active disturbance rejection sliding
mode control with tower crane systems validation, Science and Technology, 27 (2024), 3-17.
[35] R. Sahal, M. H. Khafagy, F. A. Omara, Exploiting coarse-grained reused-based opportunities in big data multi-query
optimization, Journal of Computational Science, 26 (2018), 432-452. https://doi.org/10.1016/j.jocs.2017.05.
023
[36] R. Sahal, M. Nihad, M. H. Khafagy, F. A. Omara, iHOME: Index-based JOIN query optimization for limited big
data storage, Journal of Grid Computing, 16 (2018), 345-380. https://doi.org/10.1007/s10723-018-9431-9
[37] M. Sharma, G. Singh, R. Singh, Clinical decision support system query optimizer using hybrid firefly and controlled
genetic algorithm, Journal of King Saud University-Computer and Information Sciences, 33(7) (2021), 798-809.
https://doi.org/10.1016/j.jksuci.2018.06.007
[38] T. Siddiqui, A. Jindal, S. Qiao, H. Patel, W. Le, Cost models for big data query processing: Learning, retrofitting,
and our findings, In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,
(2020), 99-113. https://doi.org/10.1145/3318464.3380584
[39] D. Sujatha, M. Subramaniam, C. R. Rene Robin, A new design of multimedia big data retrieval enabled by deep
feature learning and adaptive semantic similarity function, Multimedia Systems, 28(3) (2022), 1039-1058. https:
//doi.org/10.1007/s00530-022-00897-8
[40] M. Sun, L. Sun, Optimization of artificial intelligence in localized big data real-time query processing task scheduling
algorithm, Frontiers in Physics, 12 (2024), 1484115. https://doi.org/10.3389/fphy.2024.1484115
[41] M. R. Sundarakumar, D. Salangai Nayagi, V. Vinodhini, S. VinayagaPriya, M. Marimuthu, S. Basheer, J. A.
Renoald, A heuristic approach to improve the data processing in big data using enhanced Salp Swarm algorithm
(ESSA) and MK-means algorithm, Journal of Intelligent and Fuzzy Systems, 45(2) (2023), 2625-2640. https:
//doi.org/10.3233/JIFS-231389
[42] M. I. Tariq, S. Tayyaba, M. W. Ashraf, V. E. Balas, Deep learning techniques for optimizing medical big data, In
Deep Learning Techniques for Biomedical and Health Informatics, 1 (2020), 187-211. https://doi.org/10.1016/
B978-0-12-819061-6.00008-2
[43] D. R. Thirupurasundari, R. Kumar, H. K. Palani, S. Ilangovan, P. G. Senthilvel, Optimizing query performance in
big data systems using machine learning algorithms, In 2023 International Conference on Communication, Security
and Artificial Intelligence (ICCSAI), (2023), 891-895. https://doi.org/10.1109/ICCSAI59793.2023.10421253
[44] W. Wang, H. Guo, X. Li, S. Tang, J. Xia, Z. Lv, Deep learning for assessment of environmental satisfaction using
BIM big data in energy efficient building digital twins, Sustainable Energy Technologies and Assessments, 50 (2022),
101897. https://doi.org/10.1016/j.seta.2021.101897
[45] C. Xu, X. Du, Z. Yan, X. Fan, ScienceEarth: A big data platform for remote sensing data processing, Remote
Sensing, 12(4) (2020), 607. https://doi.org/10.3390/rs12040607
[46] J. Yang, C. Zhao, C. Xing, Big data market optimization pricing model based on data quality, Complexity, 1 (2019),
1-13. https://doi.org/10.1155/2019/5964068
[47] M. Zhang, Y. Chen, W. Susilo, PPO-CPQ: A privacy-preserving optimization of clinical pathway query for ehealthcare
systems, IEEE Internet of Things Journal, 7(10) (2020), 10660-10672. https://doi.org/10.1109/JIOT.
2020.3007518
[48] P. Zhang, S. Cui, B. Du, Fuzzy portfolio selection with different risk attitudes based on machine learning, Iranian
Journal of Fuzzy Systems, 22(1) (2025), 1-21. https://doi.org/10.22111/ijfs.2025.47341.8338
[49] W. Zhang, T. Leng, H. Sun, Optimization research of spatial big data approximate query algorithm in the context
of smart city, In International Conference on Smart Applications and Sustainability in the Artificial Intelligence of
Things, Cham: Springer Nature Switzerland, (2024), 737-745. https://doi.org/10.1007
[53] https://www.kaggle.com/datasets/thoughtvector/customer-support-on-twitter