Match Box Output

  • Explore Report
  • Alignment Report
    
    

    Explore Report File

     ______________________________________________________________________
     ______________________________________________________________________
     MATCH-BOX_server 1.3                              13-May-99    20:01:05
    
     Molecular Biology. University of NAMUR - BELGIUM
    
     Internet address: matchbox@biq.fundp.ac.be
     WEB: http://www.fundp.ac.be/sciences/biologie/bms/matchbox_submit.html
    
     ______________________________________________________________________
    
     EEEEE   X   X  PPPP   L       O      RRRR    EEEEE
     E        X X   P   P  L     O   O    R   R   E
     EEE       X    PPPP   L     O   O    RRR     EEE
     E        X X   P      L     O   O    R  R    E
     EEEEE   X   X  P      LLLLL   O      R   R   EEEEE
    
     ______________________________________________________________________
    
      REFERENCE
    
      Match-Box_server: a multiple sequence alignment tool placing emphasis
      on reliability. E. Depiereux, G. Baudoux, P. Briffeuil, I. Reginster,
      X. De Bolle, C. Vinals and E. Feytmans (1997) CABIOS 13(3) 249-256
    
      Please choose the font Monaco or Courrier11.
    
      Table 1 : submitted sequences, as read by the mailer.
      ----------------------------------------------------
    
      Please check that they are correct and do not contain embedded comments.
      The current score matrix used for the sequence analysis is 
      blosum45.sco                           
    
    >gi          sequence number   1  190 aa 
    MATHHTLWMG LALLGVLGDL QAAPEAQVSV QPNFQQDKFL GRWFSAGLAS NSSWLREKKA ALSMCKSVVA
    PATDGGLNLT STFLRKNQCE TRTMLLQPAG SLGSYSYRSP HWGSTYSVSV VETDYDQYAL LYSQGSKGPG
    EDFRMATLYS RTQTPRAELK EKFTAFCKAQ GFTEDTIVFL PQTDKCMTEQ
    
    >gi          sequence number   2  168 aa 
    APEAQVSVQP NFQPDKFLGR WFSAGLASNS SWLQEKKAAL SMCKSVVAPA ADGGFNLTST FLRKNQCETR
    TMLLQPGDSL GSYSYRSPHW GSTYSVSVVE TDYDHYALLY SQGSKGPGED FRMATLYSRT QTPRAELKEK
    FTAFCKAQGF TEDSIVFLPQ TDKCMTEQ
    
    >gi          sequence number   3  189 aa 
    MAALRMLWMG LVLLGLLGFP QTPAQGHDTV QPNFQQDKFL GRWYSAGLAS NSSWFREKKA VLYMCKTVVA
    PSTEGGLNLT STFLRKNQCE TKIMVLQPAG APGHYTYSSP HSGSIHSVSV VEANYDEYAL LFSRGTKGPG
    QDFRMATLYS RTQTLKDELK EKFTTFSKAQ GLTEEDIVFL PQPDKCIQE
    
    >gi          sequence number   4  189 aa 
    MAALPMLWTG LVLLGLLGFP QTPAQGHDTV QPNFQQDKFL GRWYSAGLAS NSSWFREKKE LLFMCQTVVA
    PSTEGGLNLT STFLRKNQCE TKVMVLQPAG VPGQYTYNSP HWGSFHSLSV VETDYDEYAF LFSKGTKGPG
    QDFRMATLYS RAQLLKEELK EKFITFSKDQ GLTEEDIVFL PQPDKCIQE
    
    >gi          sequence number   5  184 aa 
    MMRILLALSL GVACCSLWVG AEVQVQPDFQ KEKVLGKWYG IGLASNSNWF KDRKSHMKMC TTIITPTADG
    NLEVTATYPK MDRCETKSMT YFKTEQLGGF RAKSPRYGSE HDMRVVETNY DEYILMYTVK TKGSETNQIV
    SLFGRDKDLR PELLDKFQNF AKSQGLADDN IIILPHTDQC MTEA
     
    
     Table 2
     --------
    
      Frequency distribution of observed matches between all possible segments
     of length 9.
      (1) in ALL the sequences submitted
      (2) in the same sequences after shuffling their residues
      the distance between segments being calculated from the score matrix.
    
      The differences between observed and random frequencies 
      are tested by a chi-square statistic; NS:p>0.05, S:p<=0.05
    
      A significant difference indicates that similarity between AT LEAST SOME 
      sequences departs from randomness.
    
         -----------------------------------------
         Distance       (1)       (2)    Proba.
         ----------------------------------------
         288.000         1         0        NS
         306.000         6         0          S
         324.000        16         0          S
         342.000        70         0          S
         360.000        95         0          S
         378.000       101         0          S
         396.000       162         0          S
         414.000       114         0          S
         432.000       170         0          S
         450.000       154         0          S
         468.000       130         2          S
         486.000       126         7          S
         504.000       123        12          S
         522.000       137        56          S
         540.000       218       192        NS
         558.000       337       453        NS
         576.000      1415      1651        NS
         594.000      2531      2874        NS
         612.000      8405      8779        NS
         630.000     18276     19544        NS
         648.000     25118     26515        NS
         666.000     54360     55296        NS
         684.000     53085     54067        NS
         702.000     74934     73179        NS
         720.000     51321     49057        NS
         738.000     14516     14127        NS
         756.000      3522      3588        NS
         774.000       142       184        NS
         792.000         4         6        NS
    
     Figure 1
     ----------
    
      Comparison between the observed matches in the submitted sequences (*)
      and in the same sequences after shuffling (o) for ALL THE SEQUENCES.
    
      More matches than expected by random indicate
      that similarity between at least some sequences departs from randomness.
    
    
      Log       |                                             oooooo          
      Cumulated |                                       oooooo                
      Frequency |                                      o                      
                |                                     o                       
                |                                    o                        
                |                                   o                         
                |                                                             
                |                                  o                          
                |                                 o                           
                |                            *****                            
                |                        ****    o                            
                |                       *                                     
                |                     **        o                             
                |                    *                                        
                |                              o                              
                |                   *                                         
                |                             o                               
                |                  *         o                                
                |                                                             
                |                 *         o                                 
                | oooooooooooooooooooooooooo                                  
                |_____________________________________________________________
    
                 Distance calculated from the score matrix blosum45.
    
    
     Table 3
     --------
    
      Frequency distribution of observed matches between all possible segments
     of length 9.
      (1) in the LESS RELATED pair of sequences
      (2) in the same sequences after shuffling their residues
      the distance between segments being calculated from the score matrix.
    
      The differences between observed and random frequencies 
      are tested by a chi-square statistic; NS:p>0.05, S:p<=0.05
    
      A significant difference indicates that similarity between the less 
      related sequences departs from randomness.
    
         -----------------------------------------
         Distance       (1)       (2)    Proba.
         -----------------------------------------
         396.000         1         0        NS
         414.000         5         0          S
         432.000         7         0          S
         450.000         7         0          S
         468.000         8         0          S
         486.000        12         0          S
         504.000        20         2          S
         522.000        17         3          S
         540.000        26         6          S
         558.000        30        25        NS
         576.000        40        43        NS
         594.000       165       171        NS
         612.000       244       262        NS
         630.000       838       869        NS
         648.000      1730      1892        NS
         666.000      2560      2591        NS
         684.000      5345      5250        NS
         702.000      5456      5527        NS
         720.000      7613      7634        NS
         738.000      5477      5623        NS
         756.000      1739      1510          S
         774.000       486       432          S
         792.000        27        16          S
         810.000         3         0          S
    
     Figure 3
     ---------
    
      Comparison between the observed matches  for the LESS RELATED pair 
      of sequences.(*) and in the same sequences after shuffling (o).
      The less related pair appears to be sequences  4 and  5.
      More matches than expected by random indicate that similarity between
      the less related sequences departs from randomness.
    
      Log       |                                            ooooooo          
      Cumulated |                                        oooo*                
      Frequency |                                       o                     
                |                                      o                      
                |                                     o                       
                |                                    o                        
                |                                   *                         
                |                                   o                         
                |                                  o                          
                |                                 *                           
                |                                 o                           
                |                               **                            
                |                             ** o                            
                |                           **                                
                |                          *    o                             
                |                        **                                   
                |                              o                              
                |                       *     o                               
                |                            o                                
                |                      *                                      
                |oooooooooooooooooooooooooooo                                 
                |_____________________________________________________________
    
                 Distance calculated from the score matrix blosum45. 
    
     Table 4
     --------
    
     SIMILARITY MATRIX between the sequences
    
      1) The coefficient Rij (0 <= Rij <=1) is the proportion of segments
         of sequence i matching with at least one segment of sequence j.
    
      2) An asterisk points to the pairs of sequences with more matches
         than expected by random.
    
      Sequences       1     2     3     4     5
    
     1 gi          1.00* 0.91* 0.97* 0.97* 0.90*                                
     2 gi          1.00* 1.00* 1.00* 1.00* 0.90*                                
     3 gi          0.96* 0.89* 1.00* 1.00* 0.85*                                
     4 gi          0.96* 0.90* 1.00* 1.00* 0.84*                                
     5 gi          0.86* 0.86* 0.86* 0.88* 1.00*                                
    
    
      = Computational notes =
    
      1)  Matches are defined with respect to a statistical cutoff.
      To get an optimal discrimination, it is computed as the average
      of the cutoff at which random noise appears and the one
      at which it equals the signal observed between identical sequences.
      Thus a very low and even numerically nul coefficient may
      be associate to a significative difference when
      only short segments of a pair of sequences appears very similar.
    
      2)  This matrix is not symmetrical: 
        If a sequence i is shorter than a sequence j, then sequence i
        can be very similar to a part of sequence j, but sequence j
        can be only partly similar to sequence i and Rij > Rji.
    
     Table 5
     --------
    
      The similarity matrix is treated by principal coordinates analysis to
      produce a graphical representation of the similarity between the sequences.
    
      Sequences are represented in a three-dimensional space, each factor 
      being associated to a % of the total variability between the sequences.
      The first factor is generally trivial, and the grouping of the sequences is
      performed in the plane of factors 2 & 3.
    
      Sequences |  95.9 %   3.6 %   0.8 % of variability between the sequences
    
      gi        |  -0.988   0.021   0.157
      gi        |  -1.002   0.070   0.029
      gi        |  -0.988   0.142  -0.070
      gi        |  -0.990   0.118  -0.081
      gi        |  -0.926  -0.376  -0.038
    
     Figure 4
     ----------
    
      Graphical representation of the sequences in the plane of factors 2 & 3.
      Superimposed labels are printed in uppercases and listed below.
      Use the landscape output format to increase the resolution.
    
     *: y=   0.157 x=  -0.376                              *: x=   0.142 y=   0.157
     *___________________________________________________gi_______________________*
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                         gi________         |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |                                                                            |
     |gi________                                                                  |
     |                                                                            |
     |                                                                            |
     *___________________________________________________________________GI========
     *: y=  -0.081 x=  -0.376                              *: x=   0.142 y=  -0.081
    
    
      Superimposed labels
      ___________________
    
      Printed    row  col  Superimposed  
       label      #    #      labels     
     ____________________________________
     GI========    20  69     gi________
     ___________________________________________________________________________
     MATCH-BOX_server 1.3                              13-May-99    20:01:05
      Execution successful
     ___________________________________________________________________________
    
    

    Alignment Report File

    
     ___________________________________________________________________________
     MATCH-BOX_server 1.3                               13-May-99    20:01:10
    
      Molecular Biology - University of NAMUR - BELGIUM
    
     Internet address: matchbox@biq.fundp.ac.be
     WEB: http://www.fundp.ac.be/sciences/biologie/bms/matchbox_submit.html
    
     ___________________________________________________________________________
    
         A        L         II         GGG       N     N
       A   A      L                  G    G      N N   N
      A     A     L         II       G           N  N  N
      AAAAAAA     L         II       G  GGG      N   N N
      A     A     L         II       G     G     N    NN
      A     A     LLLLLL    II        GGGGG      N     N
     ___________________________________________________________________________
    
    
      Please choose the font Monaco or Courrier11.
     Table 1: submitted set of  5 sequences
     ----------------------------------------
    
    
    >gi          sequence number   1  190 aa 
    MATHHTLWMG LALLGVLGDL QAAPEAQVSV QPNFQQDKFL GRWFSAGLAS NSSWLREKKA ALSMCKSVVA
    PATDGGLNLT STFLRKNQCE TRTMLLQPAG SLGSYSYRSP HWGSTYSVSV VETDYDQYAL LYSQGSKGPG
    EDFRMATLYS RTQTPRAELK EKFTAFCKAQ GFTEDTIVFL PQTDKCMTEQ
    
    >gi          sequence number   2  168 aa 
    APEAQVSVQP NFQPDKFLGR WFSAGLASNS SWLQEKKAAL SMCKSVVAPA ADGGFNLTST FLRKNQCETR
    TMLLQPGDSL GSYSYRSPHW GSTYSVSVVE TDYDHYALLY SQGSKGPGED FRMATLYSRT QTPRAELKEK
    FTAFCKAQGF TEDSIVFLPQ TDKCMTEQ
    
    >gi          sequence number   3  189 aa 
    MAALRMLWMG LVLLGLLGFP QTPAQGHDTV QPNFQQDKFL GRWYSAGLAS NSSWFREKKA VLYMCKTVVA
    PSTEGGLNLT STFLRKNQCE TKIMVLQPAG APGHYTYSSP HSGSIHSVSV VEANYDEYAL LFSRGTKGPG
    QDFRMATLYS RTQTLKDELK EKFTTFSKAQ GLTEEDIVFL PQPDKCIQE
    
    >gi          sequence number   4  189 aa 
    MAALPMLWTG LVLLGLLGFP QTPAQGHDTV QPNFQQDKFL GRWYSAGLAS NSSWFREKKE LLFMCQTVVA
    PSTEGGLNLT STFLRKNQCE TKVMVLQPAG VPGQYTYNSP HWGSFHSLSV VETDYDEYAF LFSKGTKGPG
    QDFRMATLYS RAQLLKEELK EKFITFSKDQ GLTEEDIVFL PQPDKCIQE
    
    >gi          sequence number   5  184 aa 
    MMRILLALSL GVACCSLWVG AEVQVQPDFQ KEKVLGKWYG IGLASNSNWF KDRKSHMKMC TTIITPTADG
    NLEVTATYPK MDRCETKSMT YFKTEQLGGF RAKSPRYGSE HDMRVVETNY DEYILMYTVK TKGSETNQIV
    SLFGRDKDLR PELLDKFQNF AKSQGLADDN IIILPHTDQC MTEA
    
      The basic principle of Match-Box is to delineate boxes of similar segments in 
      ALL the sequences. In one box, any segment is significantly similar to any
      other one.Similarity between segments is computed from the scoring matrix,
      and the matching criterion is defined by a statistical cutoff.
      The current score matrix used for the sequence analysis is 
      blosum45.sco                           
    
      In the final alignment, the selected boxes are only a subset of all the 
      boxes found. Boxes incompatible with the proposed aligment, if any, are
      rejected. Table 2 shows how many boxes have been selected and rejected.
      in the final alignment, and their length. Table 3 displays selected boxes
      In a successful alignment, rejected boxes are normally short boxes.
      A large rejected box would be an indication of a possible misalignment.
    
    
      Table 2: Boxes length distribution
     ------------------------------------
         Length         Frequency
                  Selected     Rejected 
          39           1           0
         127           1           0
    
      Table 3
     --------
      Boxes selected for the optimal alignment
    
       (1) box number
       (2) pattern of gaps
       (3) first residue number
       (4) sequences
       (5) last residue number
    
         1    22    23 apeaqvsvqpnfqqdkflgrwfsaglasnsswlrekkaalsmcksvvapatdgg     76
         1     0     1 apeaqvsvqpnfqpdkflgrwfsaglasnsswlqekkaalsmcksvvapaadgg     54
         1    22    23 paqghdtvqpnfqqdkflgrwysaglasnsswfrekkavlymcktvvapstegg     76
         1    22    23 paqghdtvqpnfqqdkflgrwysaglasnsswfrekkellfmcqtvvapstegg     76
         1    17    18 wvgaevqvqpdfqkekvlgkwygiglasnsnwfkdrkshmkmcttiitptadgn     71
    
         1    22    77 lnltstflrknqcetrtmllqpagslgsysyrsphwgstysvsvvetdydqyal    130
         1     0    55 fnltstflrknqcetrtmllqpgdslgsysyrsphwgstysvsvvetdydhyal    108
         1    22    77 lnltstflrknqcetkimvlqpagapghytyssphsgsihsvsvveanydeyal    130
         1    22    77 lnltstflrknqcetkvmvlqpagvpgqytynsphwgsfhslsvvetdydeyaf    130
         1    17    72 levtatypkmdrcetksmtyfkteqlggfraksprygsehdmrvvetnydeyil    125
    
         1    22   131 lysqgskgpgedfrmatly    149
         1     0   109 lysqgskgpgedfrmatly    127
         1    22   131 lfsrgtkgpgqdfrmatly    149
         1    22   131 lfskgtkgpgqdfrmatly    149
         1    17   126 mytvktkgsetnqivslfg    144
    
         2    22   151 rtqtpraelkekftafckaqgftedtivflpqtdkcmte    189
         2     0   129 rtqtpraelkekftafckaqgftedsivflpqtdkcmte    167
         2    22   151 rtqtlkdelkekfttfskaqglteedivflpqpdkciqe    189
         2    22   151 raqllkeelkekfitfskdqglteedivflpqpdkciqe    189
         2    16   145 rdkdlrpelldkfqnfaksqgladdniiilphtdqcmte    183
    
    
     Table 4 : optimal multiple alignment with indices of reliability
     ----------------------------------------------------------------
    
    
      Sequences number, length and name
      _________________________________
    
       1   190 gi           2   168 gi           3   189 gi        
       4   189 gi           5   184 gi        
    
                  10        20        30        40        50        60        70
                   +         +         +         +         +         +         +
       1  MATHHTLWMGLALLGVLGDLQAapeaqvsvqpnfqqdkflgrwfsaglasnsswlrekkaalsmcksvva
       2  ----------------------apeaqvsvqpnfqpdkflgrwfsaglasnsswlqekkaalsmcksvva
       3  MAALRMLWMGLVLLGLLGFPQTpaqghdtvqpnfqqdkflgrwysaglasnsswfrekkavlymcktvva
       4  MAALPMLWTGLVLLGLLGFPQTpaqghdtvqpnfqqdkflgrwysaglasnsswfrekkellfmcqtvva
       5  -----MMRILLALSLGVACCSLwvgaevqvqpdfqkekvlgkwygiglasnsnwfkdrkshmkmcttiit
    
                                977444422222222222222222222222222244444444444444
    
                  80        90       100       110       120       130       140
                   +         +         +         +         +         +         +
       1  patdgglnltstflrknqcetrtmllqpagslgsysyrsphwgstysvsvvetdydqyallysqgskgpg
       2  paadggfnltstflrknqcetrtmllqpgdslgsysyrsphwgstysvsvvetdydhyallysqgskgpg
       3  pstegglnltstflrknqcetkimvlqpagapghytyssphsgsihsvsvveanydeyallfsrgtkgpg
       4  pstegglnltstflrknqcetkvmvlqpagvpgqytynsphwgsfhslsvvetdydeyaflfskgtkgpg
       5  ptadgnlevtatypkmdrcetksmtyfkteqlggfraksprygsehdmrvvetnydeyilmytvktkgse
    
          4444444444444444444444444447799777777444444444444222222222224444777777
    
                 150       160       170       180       190       200       210
                   +         +         +         +         +         +         +
       1  edfrmatlySrtqtpraelkekftafckaqgftedtivflpqtdkcmteQ
       2  edfrmatlySrtqtpraelkekftafckaqgftedsivflpqtdkcmteQ
       3  qdfrmatlySrtqtlkdelkekfttfskaqglteedivflpqpdkciqe-
       4  qdfrmatlySraqllkeelkekfitfskdqglteedivflpqpdkciqe-
       5  tnqivslfg-rdkdlrpelldkfqnfaksqgladdniiilphtdqcmteA
    
          777799999 444444444444444444444444444444444444444                     
    
    
      Table 4 : Aligned residues (included in boxes) are printed in 
      lowercase. Other residues (uppercase) are NOT aligned.
    
      Only the multiple alignment of the WHOLE set of sequences is performed.
    
      RELIABILITY SCORES
    
      A score for 1 to 9 is written below each position in the boxes.
      It is related to the statistical significance of the alignment at this
      position. A score of 5 corresponds to a similarity of equal occurence in 
      related and unrelated sequences.Lower the score is, higher the reliability 
      of the alignment. As an example, the following results have been obtained 
      on 20 families of known structures sharing between 9% and 44% of conserved 
      residues.
    
      Percentage of  correctly predicted aligned residues obtained in TESTS:
    
      Reliability    Minimum       Maximum
         Score         %             %
      ------------------------------------
          6           41.3          86.8
          5           48.8          100
          4           73.9          100
          3           84.6          100
          2           100           100       
      ------------------------------------
    
      GAPS
    
    
      When lowercase amino-acids are aligned to gaps, it means that the position 
      of the gaps is not completely defined. If two successive selected boxes
      are overlapping by a maximum of k amino acids in one of the sequences,
      the final alignment will show a gap aligned with lowercase amino acids.
      Part of this gap, or the whole gap, can then be moved partially or totally
      to the right by r positions (r being lower or equal to k).
      It means that Match-Box is not able to fix exactly the position of this
      gap, but that the gap can be placed somewhere to the right within 
      a range of k amino acids.
    
      Please refer to the table 3 to get precisely the limits of the boxes.
      You may resubmit a subset of your sequences in order to
      refine within group alignment. Results of EXPLORE may help you
      in defining groups of sequences.
    
      REFERENCE
    
      Match-Box_server: a multiple sequence alignment tool placing emphasis
      on reliability. E. Depiereux, G. Baudoux, P. Briffeuil, I. Reginster,
      X. De Bolle, C. Vinals and E. Feytmans (1997) CABIOS 13(3) 249-256.
    
      A postscript file with the boxes outlined can be obtained .
     ___________________________________________________________________________
     MATCH-BOX_server 1.3                               13-May-99    20:01:10
      Execution successful
     ___________________________________________________________________________
    
    
    

  • .
    BCM HGSC