Signal peptide prediction in the space-frequency domain
Date
2006
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
University of Delaware
Abstract
Bioinformatics makes use of applied mathematics, informatics, statistics and computer science to solve biological problems. With the explosion of biological data, the signal processing community is in a unique position to analyze the data using traditional and non-traditional signal processing methods. ☐ Proteins are the building blocks of cells. A protein is a string of linked amino acids, each one of them represented as a symbol in a 20-letter alphabet. Analyzing the symbolic amino acid sequences helps us find the characteristics of the proteins and predict the secondary structures. However, in some cases, the characteristic patterns are too weak to be detected by analyzing the symbolic strings. Then, it is possible to assign chemical property indices to the amino acids to map the symbolic sequence to a numeric representation, so that the analysis is performed over this representation. ☐ A signal peptide is a short stretch of amino acids found at the beginning of proteins. It is typically rich in hydrophobic amino acids. One of the major functionalities of the signal peptide is to localize proteins to specific regions within the cell. Therefore, knowledge of a specific signal peptide for a protein provides an important clue to its likely locations. In this thesis, we propose a new method to detect the presence of the signal peptide based on the space-frequency processing of the numeric amino acid sequences. ☐ There are more than 400 index mappings available which map the symbolic amino acid sequences to numeric representations. Therefore, the selection of the index mapping becomes an important issue. In this thesis, we propose a method to select the index mappings which is suitable for signal peptide detection. To detect the presence of the signal peptide in an amino acid sequence, the numeric sequence is transformed to the space-frequency domain by the Wigner-Ville transform. We observed that the amino acid sequence with signal peptide tends to have smaller variance in the space-frequency domain than the amino acid sequence without signal peptide. Based on this observation, we devised a new signal peptide detection algorithm which effectively differentiates between the amino acid sequences with the signal peptide and the amino acid sequence without the signal peptide. ☐ To evaluate our method, we use a dataset of 210 protein sequences, half of which have signal peptides. We use half of the dataset to learn the optimal parameters of the detection algorithm. The other half of the dataset is used to test the algorithm. Experimental results show that our method detects the presence of the signal peptides at a promising rate.