GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION

Document Type: Research Paper

Authors

1 COMPUTER SCIENCE AND ENGINEERING DEPARTMENT, COLLEGE OF ENGINEERING, SHIRAZ UNIVERSITY, SHIRAZ, IRAN

2 COMPUTER SCIENCE AND ENGINEERING DEPARTMENT, COLLEGE OF ENGINEERING, SHIRAZ UNIVERSITY, SHIRAZ, IRAN

3 BIOLOGY DEPARTMENT, COLLEGE OF SCIENCE, SHIRAZ UNIVERSITY, SHIRAZ, IRAN

Abstract

This paper considers the generation of some interpretable fuzzy rules
for assigning an amino acid sequence into the appropriate protein superfamily.
Since the main objective of this classifier is the interpretability of rules, we have
used the distribution of amino acids in the sequences of proteins as features.
These features are the occurrence probabilities of six exchange groups in the
sequences. To generate the fuzzy rules, we have used some modified versions of
a common approach. The generated rules are simple and understandable,
especially for biologists. To evaluate our fuzzy classifiers, we have used four
protein superfamilies from UniProt database. Experimental results show the
comprehensibility of generated fuzzy rules with comparable classification
accuracy.

Keywords


[1]R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, Fast discovery of association rules, in U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy,Advances in Knowledge Discovery and Data Mining

, AAAI Press, 1996.

[2]S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J.Lipman,

Gapped blast and PSI-blast: A new generation of protein database search programs,Nucleic Acids Research,

25 (17) (1997), 3389-3402.

[3]S. Bandyopadhyay, An efficient technique for superfamily classification of amino acid

sequences: feature extraction, fuzzy clustering and prototype selection

, Fuzzy Sets andSystems,152 (2005), 5-16.

[4]A. Baxevanis and F.B.F. Ouellette, Bioinformatics: A practical guide to the analysis of genes

and proteins, Wiley, New York, 1998.

[5]M. O. Dayhoff, R. M. Schwartz and B. C. Orcutt, A model of evolutionary change in

proteins, Atlas of Protein Sequence and Structure, 5 (1978), 345-352.

[6]L. French, A. Ngom and L. Rueda, Fast protein superfamily classification using principal

component null space analysis, Proc. 18th Canadian Conference Artificial Intelligence,

Victoria, Canada, (2005), 158-169.

[7]A. Gonzalez and R. Perez, SLAVE: A genetic learning system based on an iterative

approach, IEEE Trans. Fuzzy Systems, 7 (2) (1999), 176-191.

[8]H. Ishibuchi, T. Nakashima and T. Morisawa, Voting in fuzzy rule-based systems for pattern

classification problems, Fuzzy Sets and Systems, 103 (2) (1999), 223-238.

[9]H. Ishibuchi, K. Nozaki, and H. Tanaka, Distributed representation of fuzzy rules and its

application to pattern classification, Fuzzy Sets and Systems, 52 (1) (1992), 21-32.

[10]H. Ishibuchi and T. Yamamoto, Comparison of heuristic criteria for fuzzy rule selection in

classification problems, Fuzzy Optimization and Decision Making, 3 (2) (2004), 119-139.

[11]H. Ishibuchi and T. Yamamoto, Rule weight specification in fuzzy rule-based classification

systems, IEEE Trans. Fuzzy Systems, 13 (4) (2005), 428-435.

[12]T. Jaakkola, M. Diekhans and D. Haussler, A discriminative framework for detecting remote

protein homologiesJournal of Computational Biology, 2000.

[13]C. Leslie, E. Eskin and W.S. Noble, The spectrum kernel: a string kernel for SVM protein

classification, Pac. Symp. Biocomputing, (2002), 564-575.

[14]M. Madera and J. Gough, A comparison of profile hidden Markov model procedures for

remote homology detectionNucleic Acids Res., 30 (2002), 4321–4328.

[15]E. G. Mansoori, M. J. Zolghadri and S. D. Katebi, A weighting function for improving fuzzy

classification systems performance, Fuzzy Sets and Systems, 158 (5) (2007), 583-591.

[16]E. G. Mansoori, M. J. Zolghadri and S. D. Katebi, Using distribution of data to enhance

performance of fuzzy classification systems, Iranian Journal of Fuzzy Systems, 4 (1) (2007),21-36.

[17]R. Mikut, J. Jäkel and L. Gröll, Interpretability issues in data-based learning of fuzzy systems,

Fuzzy Sets and Systems,150 (2005), 179-197.

[18]W. Pedrycz, Why triangular membership functions?, Fuzzy Sets and Systems, 64 (1) (1994),

21-30.

[19]J. R. Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial

Intelligence Research,4 (1996), 77-90.

[20]J. A. Roubos, M. Setnes and J. Abonyi, Learning fuzzy classification rules from labeled data,

IEEE Trans. Fuzzy Systems,8 (5) (2001), 509-522.

 

[21]The UniProt Consortium, The Universal Protein Resource (UniProt), Nucleic Acids

Research,5 (2007), D193-D197.

[22]D. Wang and G. Huang, Protein sequence classification using extreme learning machine,

Proc. Int. Joint Conf. Neural Networks, Canada, 2005.

 

[23]D. Wang, N. K. Lee and T. S. Dillon, Extraction and optimization of fuzzy protein sequences

classification rules using GRBF neural networks, Neural Information Processing - Letters

and Reviews,1 (1) (2003), 53-59.

[24]J. T. L. Wang, Q. C. Ma, D. Shasha and C. H. Wu, New techniques for extracting features

from protein sequences, IBM Systems Journal, 40 (2) (2001), 426-441.

[25]C. H. Wu and J. W. McLarty, Neural Networks and Genome Informatics, Elsevier,

Amsterdam, (2000).

 

[26]M. J. Zolghadri and E. G. Mansoori, Weighting fuzzy classification rules using Receiver

Operating Characteristics (ROC) analysis, Information Sciences, 177 (11) (2007), 2296-2307.