GENERATING FUZZY RULES FOR PROTEIN CLASSIFICATION

Document Type : Research Paper

Authors

1 COMPUTER SCIENCE AND ENGINEERING DEPARTMENT, COLLEGE OF ENGINEERING, SHIRAZ UNIVERSITY, SHIRAZ, IRAN

2 BIOLOGY DEPARTMENT, COLLEGE OF SCIENCE, SHIRAZ UNIVERSITY, SHIRAZ, IRAN

Abstract

This paper considers the generation of some interpretable fuzzy rules
for assigning an amino acid sequence into the appropriate protein superfamily.
Since the main objective of this classifier is the interpretability of rules, we have
used the distribution of amino acids in the sequences of proteins as features.
These features are the occurrence probabilities of six exchange groups in the
sequences. To generate the fuzzy rules, we have used some modified versions of
a common approach. The generated rules are simple and understandable,
especially for biologists. To evaluate our fuzzy classifiers, we have used four
protein superfamilies from UniProt database. Experimental results show the
comprehensibility of generated fuzzy rules with comparable classification
accuracy.

Keywords


[1]R. Agrawal, H. Mannila, R. Srikant, H. Toivonen and A. I. Verkamo, Fast discovery of association rules, in U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth and R. Uthurusamy,Advances in Knowledge Discovery and Data Mining
, AAAI Press, 1996.
[2]S. F. Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller and D. J.Lipman,
Gapped blast and PSI-blast: A new generation of protein database search programs,Nucleic Acids Research,
25 (17) (1997), 3389-3402.
[3]S. Bandyopadhyay, An efficient technique for superfamily classification of amino acid
sequences: feature extraction, fuzzy clustering and prototype selection
, Fuzzy Sets andSystems,152 (2005), 5-16.
[4]A. Baxevanis and F.B.F. Ouellette, Bioinformatics: A practical guide to the analysis of genes
and proteins, Wiley, New York, 1998.
[5]M. O. Dayhoff, R. M. Schwartz and B. C. Orcutt, A model of evolutionary change in
proteins, Atlas of Protein Sequence and Structure, 5 (1978), 345-352.
[6]L. French, A. Ngom and L. Rueda, Fast protein superfamily classification using principal
component null space analysis, Proc. 18th Canadian Conference Artificial Intelligence,
Victoria, Canada, (2005), 158-169.
[7]A. Gonzalez and R. Perez, SLAVE: A genetic learning system based on an iterative
approach, IEEE Trans. Fuzzy Systems, 7 (2) (1999), 176-191.
[8]H. Ishibuchi, T. Nakashima and T. Morisawa, Voting in fuzzy rule-based systems for pattern
classification problems, Fuzzy Sets and Systems, 103 (2) (1999), 223-238.
[9]H. Ishibuchi, K. Nozaki, and H. Tanaka, Distributed representation of fuzzy rules and its
application to pattern classification, Fuzzy Sets and Systems, 52 (1) (1992), 21-32.
[10]H. Ishibuchi and T. Yamamoto, Comparison of heuristic criteria for fuzzy rule selection in
classification problems, Fuzzy Optimization and Decision Making, 3 (2) (2004), 119-139.
[11]H. Ishibuchi and T. Yamamoto, Rule weight specification in fuzzy rule-based classification
systems, IEEE Trans. Fuzzy Systems, 13 (4) (2005), 428-435.
[12]T. Jaakkola, M. Diekhans and D. Haussler, A discriminative framework for detecting remote
protein homologiesJournal of Computational Biology, 2000.
[13]C. Leslie, E. Eskin and W.S. Noble, The spectrum kernel: a string kernel for SVM protein
classification, Pac. Symp. Biocomputing, (2002), 564-575.
[14]M. Madera and J. Gough, A comparison of profile hidden Markov model procedures for
remote homology detectionNucleic Acids Res., 30 (2002), 4321–4328.
[15]E. G. Mansoori, M. J. Zolghadri and S. D. Katebi, A weighting function for improving fuzzy
classification systems performance, Fuzzy Sets and Systems, 158 (5) (2007), 583-591.
[16]E. G. Mansoori, M. J. Zolghadri and S. D. Katebi, Using distribution of data to enhance
performance of fuzzy classification systems, Iranian Journal of Fuzzy Systems, 4 (1) (2007),21-36.
[17]R. Mikut, J. Jäkel and L. Gröll, Interpretability issues in data-based learning of fuzzy systems,
Fuzzy Sets and Systems,150 (2005), 179-197.
[18]W. Pedrycz, Why triangular membership functions?, Fuzzy Sets and Systems, 64 (1) (1994),
21-30.
[19]J. R. Quinlan, Improved use of continuous attributes in C4.5, Journal of Artificial
Intelligence Research,4 (1996), 77-90.
[20]J. A. Roubos, M. Setnes and J. Abonyi, Learning fuzzy classification rules from labeled data,
IEEE Trans. Fuzzy Systems,8 (5) (2001), 509-522.
 
[21]The UniProt Consortium, The Universal Protein Resource (UniProt), Nucleic Acids
Research,5 (2007), D193-D197.
[22]D. Wang and G. Huang, Protein sequence classification using extreme learning machine,
Proc. Int. Joint Conf. Neural Networks, Canada, 2005.
 
[23]D. Wang, N. K. Lee and T. S. Dillon, Extraction and optimization of fuzzy protein sequences
classification rules using GRBF neural networks, Neural Information Processing - Letters
and Reviews,1 (1) (2003), 53-59.
[24]J. T. L. Wang, Q. C. Ma, D. Shasha and C. H. Wu, New techniques for extracting features
from protein sequences, IBM Systems Journal, 40 (2) (2001), 426-441.
[25]C. H. Wu and J. W. McLarty, Neural Networks and Genome Informatics, Elsevier,
Amsterdam, (2000).
 
[26]M. J. Zolghadri and E. G. Mansoori, Weighting fuzzy classification rules using Receiver
Operating Characteristics (ROC) analysis, Information Sciences, 177 (11) (2007), 2296-2307.