SSP Program Description

SSP - Prediction of a-helix and b-strand segments of globular proteins

(Version 2. 10.5.94)

Department of Cell Biology, Baylor College of Medicine


Analysis of amino acid sequences is available through MOSAIC or by sending your file containing a sequence (a sequence format described below) to service@bchs.uh.edu or services@bioinformatics.weizmann.ac.il with the subject line "ssp".

Examples: mail -s nnssp service@bchs.uh.edu < test.seq

mail -s nnssp services@bioinformatics.weizmann.ac.il < test.seq

where test.seq a file with the sequence.

Method description:
Our segment-oriented method is designed to locate secondary structure elements and uses linear discriminant analysis to assign segments of a given amino acid sequence to a particular type of secondary structure, by taking into account the amino acid composition of internal parts of segments as well as their terminal and adjacent regions. Four linear discriminant functions were constructed for recognition of short and long a-helix and b-strand segments, respectively. These functions combine 3 characteristics: hydrophobic moment, segment singlet and pair preferences to an a-helix or b-strand. To improve the prediction accuracy of the method, a simple version which treats multiple sequence alignments that are used as input in place of single sequences has been developed.

Accuracy:
Overall 3-states (a, b, c) prediction gives ~65.1% correctly predic- ted residues on 126 non-homologous proteins using the jack-knife test procedure (The accuracy is good if you have no homologous sequences to apply Sander et al. method (Rost,Sander, Mol.Biol,1993,232,584-599) that has about 71% accuracy with using these sequences and about 61% without them). Analysis of the prediction results shows a high prediction accuracy of long secondary structure segments (~89% of a- helices of length greater than 8 and ~71% of b-strands of length greater than 6 are correctly located with probability of correct prediction 0.82 and 0.78 respectively). Using the mean values of discriminant functions over the aligned sequences of homologous proteins, we achieved a prediction accuracy of 68.2%. It must be mentioned that our variant of nearest-neighbor algorithm with using multiply sequence alignments of homologous proteins has 72% accuracy and 67.6% accuracy without homologous proteins. (see "nnssp" program of this server).

SEE ALSO "nnssp" program of this server.

Submitting sequences via email:

For email submission the sequences must have the following format:

a) if you send one sequence:

  1 line - sequence name 
  2 line - number 1 in format I5 
  3 and subsequent lines - amino acid sequence

   Sequence length must be less than 2000 amino acids !!!

  for example:

  ADENYLATE KINASE     
      1		
  RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
  KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY
  QIDTVINLNVPFEVIKQRLTARWIHPGSGRVYNIEFNPPKTMGIDDLTGE
  PLVQREDDRPETVVK............
   (Restrict the line length to 75 characters).

b) if you send multiple aligned sequences

  1 line - sequence name 
  2 line - number of aligned sequences and length of protein
  3 and subsequent lines - aligned sequences in format 60a1  
    (where 3-d line is empty or with numbers as well as other lines
     which separate parts of aligned sequences)

The number of aligned sequences must be less than 250 !!!
	
 for example:

ACTINOXANTHIN                                                         
    5  107
        10        20        30        40        50        60 (numbers not    
APAFSVSPASGASDGQSVSVSVAAAGETYYIAQaAPVGGQDAaNPATATSFTTDASGAAS  necessary) 
APAFSVSPASGLSDGQSVSVSGAAAGETYYIAQCAPVGGQDACNPATATSFTTDASGAAS
APTATVTPSSGLSDGTVVKVAGAgaGTAYDVGQCAWVdgVLACNPADFSSVTADANGSAS
APGVTVTPATGLSNGQTVTVSATgpGTVYHVGQCAVvpGVIGCDATTSTDVTADAAGKIT
ATPKSSSGGAGASTGSGTSSAAVTSgaASSAQQSGLQGATGAGGGSSSTPGTQPGSGAGG
        70        80        90       100      
FSFTVRKSYAGQTPSGTPVGSVDbATDAbNLGAGNSGLNLGHVALTF
FSFV-RKSYAGZTPSGTPVGSVDCATDACNLGAGNSGLNLGHVALTF
TSLTVRRSFEGFLFDGTRWGTVDCTTAACQVGLSDAAGNGpgVAISF
AQLKVHSSFQAVvaNGTPWGTVNCKVVSCSAGLGSDSGEGAAQAITF
AIAARPVSAMGGtpPHTVPGSTNTTTTAMAGGVGGPgaNPNAAALM-
 
(you can use small letters for Cys amino acids, if you want)

Alignment MUST be without  deletions in the 1-st (query) sequence!!!
Example of SSP output:
   ADENYLATE KINASE     
                    10        20        30        40        50
   pred A:    aaaaaaaaa          aaaaaaaaa     aaaaaaaaa     aaa
   AA         N  4.1  C          N  2.2  C     N  4.4  C     N  
   pred B:                  bbbb                                
   BB                       N2 C                                
   Predic     aaaaaaaaa     bbbb aaaaaaaaa     aaaaaaaaa     aaa
   a/acid     RLLRAIMGAPGSGKGTVSSRITKHFELKHLSSGDLLRDNMLRGTEIGVLA
                    60        70        80        90       100
   pred A:    aaaaaa       aaaaaaaaaaaaaaaaaaaaaaa     aaaaaaaaa
   AA         2.2  C       N    4.2    CN   2.4  C     N  5.4  C
   pred B:                 bbbbbbb                              
   BB                      N 2.6 C                              
   Predic     aaaaaa       aaaaaaaaaaaaaaaaaaaaaaa     aaaaaaaaa
   a/acid     KTFIDQGKLIPDDVMTRLVLHELKNLTQYNWLLDGFPRTLPQAEALDRAY

The output of the prediction program presents not only final optimal variant of the secondary structure assignment, but also a set of potential a-helix and b-strand segments that were computed without consideration of their competition. Because the protein secondary structure is finally stabilized during the formation of the tertiary structure, the alternative variants of the a-helix and b-strand segments may be important for methods of tertiary structure prediction.

Reference:

Solovyev V.V.,Salamov A.A.
  Method of calculation of discrete secondary structures 
  in globular proteins. Molek. Biol. 25:810-824,1991 (in Russ.)
Solovyev V.V.,Salamov A.A. 1994,
  Secondary structure prediction based on  discriminant analysis. 
  In Computer analysis of Genetic macromolecules. (eds. Kolchanov N.A., Lim 
  H.A.), World Scientific, p.352-364.
Solovyev V.V., Salamov A.A. Predicting a-helix and b-strand segments
  of globular proteins. CABIOS (1994), V.10,6,661-669 


Back to Protein secondary structure Home Page

Victor V.Solovyev, Department of Cell Biology, Baylor College of Medicine
solovyev@cmb.bcm.tmc.edu
Last modified: Wed Oct 20 17:21:46 CDT 1999