ON FUZZY NEIGHBORHOOD BASED CLUSTERING ALGORITHM WITH LOW COMPLEXITY

Document Type : Research Paper

Authors

1 Department of Industrial Engineering, Izmir University, Gursel Aksel Blv 14, Uckuyular, Izmir, Turkey

2 Department of Computer Science, Dokuz Eylul University, Izmir, 35160, Turkey, Institute of Cybernetics, Azerbaijan National Academy of Sciences, Azerbaijan

Abstract

The main purpose of this paper is to achieve improvement in the
speed of Fuzzy Joint Points (FJP) algorithm. Since FJP approach is a basis
for fuzzy neighborhood based clustering algorithms such as Noise-Robust FJP
(NRFJP) and Fuzzy Neighborhood DBSCAN (FN-DBSCAN), improving FJP
algorithm would an important achievement in terms of these FJP-based meth-
ods. Although FJP has many advantages such as robustness, auto detection
of the optimal number of clusters by using cluster validity, independency from
scale, etc., it is a little bit slow. In order to eliminate this disadvantage, by im-
proving the FJP algorithm, we propose a novel Modi ed FJP algorithm, which
theoretically runs approximately n= log2 n times faster and which is less com-
plex than the FJP algorithm. We evaluated the performance of the Modi ed
FJP algorithm both analytically and experimentally.

Keywords


[1] R. Agrawal, J. Gehrke, D. Gunopulos and P. Raghavan, Automatic subspace clustering of
high dimensional data for data mining applications, Proceedings of the 1998 ACM-SIGMOD
Int. Conference on Management of Data, Seattle, Washington, June 1998.
[2] M. Ankerst, M. M. Breunig, H. P, Kriegel and J. Sander, OPTICS: ordering points to identify
the clustering structure, In: Proceedings of ACM SIGMOD International Conference on
Management of Data, Philadelphia, PA, (1999), 49{60.
[3] A. M. Bensaid, L. O. Hall, J. C. Bezdek, L. P. Clarke, M. L. Silbiger, J. A. Arrington and R.
F. Murtagh, Validity-guided (re)clustering with applications to image segmentation, IEEE
Transactions on Fuzzy Systems, 4 (1996), 112{123.
[4] J. C. Bezdek, Fuzzy mathematics in pattern classification, PhD Thesis, Cornell Univ, NY,
1973.
[5] M. H. Chehreghani, H. Abolhassani and M. H. Chehreghani, Improving density-based methods
for hierarchical clustering of web pages, Data & Knowledge Engineering, 67 (2008), 30{50.
[6] T. H. Cormen, C. E. Leiserson, R. L. Rivest and C. Stein, Introduction to algorithms, The
MIT Press, 2001.
[7] A. P. Dempster, N. M. Laird and D. B. Rubin, Maximum likelihood from incomplete data
via the EM algorithm, Journal of Royal Statistical Society, Series B, 39 (1977), 1{38.
[8] J. C. Dunn, A fuzzy relative of the ISODATA process and its use in detecting compact wellseparated
clusters, Journal of Cybernetics, 3(3) (1973), 32{57.
[9] M. Ester, H. P. Kriegel, J. Sander and X. Xu, A density-based algorithm for discovering clusters
in large spatial databases with noise, In Proc. 2nd International Conference on Knowledge
Discovery and Data Mining, (1996), 226{231.
[10] D. Fisher, Knowledge acquisition via conceptual clustering, Machine Learning, 2 (1987),
139{172.
[11] S. Guha, R. Rastogi and K. Shim, CURE: an efficient clustering algorithms for large
databases, In: Proceeding ACM SIGMOD International Conference on Management of Data,
Seattle, WA, (1998), 73{84.
[12] J. Han and M. Kamber, Data mining concepts and techniques, Morgan Kaufmann Publishers,
San Francisco, CA, 2001.
[13] A. Hinneburg and A. K. Daniel, An efficient approach to clustering in large multimedia
databases with noise, Proceedings of the 4th Int. Conference on Knowledge Discovery and
Data Mining (KDD98), New York, (1998), 58{65.
[14] P. Hore, L. O. Hall, D. B. Goldgof, Y. GU, A. A. Maudsley and A. Darkazanli, A scalable
framework for segmenting magnetic resonance images, Journal of Signal Processing Systems,
54 (2009), 183{203.
[15] E. Januzaj, H. P. Kriegel and M. Pfei
e, DBDC: density based distributed clustering, 5th
International Conference on Extending Database Technology (EDBT), Heraklion, Greece,
(2004a), 88{105.
[16] E. Januzaj, H. P. Kriegel and M. Pfei
e, Scalable density based distributed clustering, 15th
International Conference on Machine Learning (ECML) and the 8th European Conference on
Principles and Practice of Knowledge Discovery in Databases (PKDD), Pisa, Italy, 2004b.
[17] G. Karypis, E. H. Han and V. Kumar, CHAMELEON: a hierarchical clustering algorithm
using dynamic modeling, IEEE Computer, 32(8) (1999), 68{75.
[18] L. Kaufman and P. J. Rousseuw, Finding groups in data: an introduction to cluster analysis,
John Wiley&Sons, Inc, 1990.
[19] E. Mehdizadeh, S. Sadi-Nezhad and R. Tavakkoli-Moghaddam, Optimization of fuzzy clustering
criteria by a hybrid PSO and fuzzy c-means clustering algorithm, Iranian Journal of
Fuzzy Systems, 5(3) (2008), 1{14.
[20] E. N. Nasibov and G. Ulutagay, A new approach to clustering problem using the fuzzy joint
points method, Automatic Control and Computer Sciences, 39(6) (2005), 8{17.
[21] E. N. Nasibov and G. Ulutagay, On the fuzzy joint points method for fuzzy clustering problem,
Automatic Control and Computer Sciences, 40(5) (2006), 33{44.
[22] E. N. Nasibov and G. Ulutagay, A new unsupervised approach for fuzzy clustering, Fuzzy
Sets and Systems, 158(19) (2007), 2118{2133.
[23] E. N. Nasibov and G. Ulutagay, Robustness of density-based clustering methods with various
neighborhood relations, Fuzzy Sets and Systems, 160(24) (2009), 3601{3615.
[24] T. R. Ng and J. Han, Efficient and effective clustering methods for spatial data mining,
Proceedings of the 20th Very Large Databases Conference (VLDB94), Santiago, Chile, (1994),
144{155.
[25] J. K. Parker, L. O. Hall and A. Kandel, Scalable fuzzy neighborhood DBSCAN, IEEE Interna-
tional Conference on Fuzzy Systems, Barcelona, Spain, doi: 10.1109/FUZZY.2010.5584527,
(2010), 1{8.
[26] W. Pedrycz and F. Gomide, An introduction to fuzzy sets, MIT Press, MA, 1998.
[27] W. Pedrycz, Distributed and collaborative fuzzy modeling, Iranian Journal of Fuzzy Systems,
4(1) (2007), 1{19.
[28] J. Sander, M. Ester, H. P. Kriegel and X. Xu,Density-based clustering in spatial databases:
the algorithm GDBSCAN and its applications, Data Mining and Knowledge Discovery, 2
(1998), 169{194.
[29] G. Sheikholeslami, S. Chatterjee and A. Zhang, WaveCluster: a multi-resolution clustering
approach for very large spatial databases, Proceedings of the 24th Very Large Databases
Conference (VLDB 98), New York, 1998.
[30] G. Ulutagay and E. Nasibov, Fuzzy and crisp clustering methods based on the neighborhood
concept: a comprehensive review, Journal of Intelligent and Fuzzy Systems, 23 (2012), 271{
281.
[31] W. Wang, Y. Jiong and R. Muntz, STING: a statistical information grid approach to spatial
data mining, Proceedings of the 23rd Very Large Databases Conference (VLDB 1997), Athens,
Greece, 1997.
[32] X. Xiaowei, E. Martin, H. P. Kriegel and J. Sander, A distribution-based clustering algorithm
for mining in large spatial databases, Proceedings of the 14th Int. Conference on Data
Engineering (ICDE98), Orlando, Florida, (1998), 324{331.
[33] A. Z. Xu, J. Chen and J. Wu, Clustering algorithm for intuitionistic fuzzy sets, Information
Sciences, 178 (2008), 3775{3790.
[34] R. R. Yager and D. P. Filev, Approximate clustering via the mountain method, IEEE Trans-
actions on Systems, Man and Cybernetics, 24(8) (1994) , 1279{1284.
[35] X. L. Yang, Q. Song and Y. L. Wu, A robust deterministic algorithm for data clustering,
Data & Knowledge Engineering, 62 (2007), 84{100.
[36] T. Zhang, R. Ramakrishnan and M. Livny, BIRCH: an efficient data clustering method for
very large databases, Proceedings of the 1996 ACM SIGMOD Int. Conference on Management
of Data, Montreal, Canada, (1996), 103{113.