A computational approach for the identification of small GTPases based on preprocessed amino acid sequences
In: Technology in Cancer Research and Treatment, Jg. 8 (2009) ; Nr. 5, S. 333-342
Zeitschriftenaufsatz / Fach: Biologie; Medizin; Informatik
The prediction of essential biological features based on a given protein sequence is a challenging task in computational biology. To limit the amount of in vitro verification, the prediction of essential biological activities gives the opportunity to detect so far unknown sequences with similar properties. Besides the application within the identification of proteins being involved in tumorigenesis, other functional classes of proteins can be predicted. The prediction accuracy depends on the selected machine learning approach and even more on the composition of the descriptor set used. A computational approach based on feedforward neural networks was applied for the prediction of small GTPases. Consequently, this was realized by taking secondary structure and hydrophobicity information as a preprocessing architecture and thus, as descriptors for the neural networks. We developed a neural network cluster, which consists of a filter network and four subfamily networks. The filter network was trained to identify small GTPases and the subfamily networks were trained to assign a small GTPase to one of the subfamilies. The accuracy of the prediction, whether a given sequence represents a small GTPase is very high (98.25%). The classifications of the subfamily networks yield comparable accuracy. The high prediction accuracy of the neural network cluster developed, gives the opportunity to suggest the use of hydrophobicity and secondary structure prediction in combination with a neural network cluster, as a promising method for the prediction of essential biological activities.