BEAUTY Post-Processor Program Description

The BEAUTY (BLAST Enhanced Alignment Utility) Post-Processor adds a variety of very useful information to BLASTX 1.4 search results returned by the NCBI's BLAST server.

Beauty information is likewise added to NCBI's Gapped BLASTP and W.R.Pearson's FASTA searches of NCBI's NR Protein Database provided by the Human Genome Center, Baylor College of Medicine.

BEAUTY incorporates figures summarizing the information on the locations of local hits and any annotated domains and sites directly into BLAST search results. These enhancements make it much easier to detect weak, but functionally significant, matches in BLAST database searches. In addition, the time needed for a scientist to fully evaluate the BEAUTY search results is much less than the time needed to evaluate a comparable BLAST search result.

The BEAUTY Post-Processor can be used with any of the protein databases searches provided by the NCBI BLAST server. The NCBI's Non-redundant (nr) protein database is the default database. Other standard protein sequence databases from the NCBI are available for searches using the BLASTP + BEAUTY and BLASTP/BLASTX + BEAUTY option pages for the BEAUTY Post-Processor.

A database of annotated domains/sites was created for use with the BEAUTY Post-Processor by 1) scanning the Entrez database for those protein sequences with annotations describing known domains and sites within the sequence, 2) matching each Entrez sequence against the sequence motifs in the PROSITE pattern database and storing the location of each hit, 3) extracting the locations of the conserved blocks within the sequences represented in the BLOCKS database, 4) extracting the locations of the domains identified in the sequences in the PRINTS protein fingerprint database, and 5) extracting the locations of the domains identified in the sequences in PFAM, Protein families database of alignments and HMMs.

BEAUTY incorporates information on the locations of any annotated domains and sites directly into BLAST search results:

1) A figure is added showing the relative location of each hit (HSP) within the query sequence with the accession number linked to the individual reports listed below. In addition, the query sequence is matched against the PROSITE pattern database, and location of all pattern matches within the query sequence is displayed:

Locally-aligned regions (HSPs) with respect to query sequence:

Locus_ID        
gi|44804|lcl|2  |                __________ _______                
sp|P13186|KIN2  |  ____        ____         _______                
sp|P27704|ERK3  |                         ___________              
gi|4229|lcl|13  |                   _______    ________            
gi|393281|lcl|  |                          ________                
sp|P32361|IRE1  |                          ________                
gi|450233|lcl|  |      ____                 _______                
pir||B40466|gi  |                             ________             
sp|P08414|KCC4  |                           _______                
gi|306479|lcl|  |                           _______   
sp|P13185|KIN1  |  ____        ____         _______                

Prosite Hits:                                   __                 
                 __________________________________________________
Query sequence: |          |          |          |           |     | 224
                0         50        100        150         200
__________________
Prosite hits:
   PROTEIN_KINASE_TYR   Tyrosine protein kinases specific active 138..150
__________________

2) A figure is added for each BLAST hit showing:
a) the positions of the local hits (HSPs) and
b) the location of any annotated domains and sites within each matched sequence, e.g.,:

Local hits (HSPs):       ____________          ____  ______               
Annotated Domains:          ______                _______                 
                        __________________________________________________
Database sequence:     |         |        |        |        |        |    | 271
                       0        50      100      150      200      250
__________________

Annotated Domains:
   Entrez               np-binding site: ATP.                    40..47
   BLOCKS               ABC_TRANSPORTER: ABC transporters family 23..53
   BLOCKS               ABC_TRANSPORTER: ABC transporters family 144..175
   PROSITE              ATP_GTP_A: ATP/GTP-binding site motif A  40..47
   PROSITE              ABC_TRANSPORTER: ABC transporters family 144..158
__________________

In summary, by incorporating annotated domain and site information directly into BLAST search results, BEAUTY can greatly improve the identification of weak, but functionally significant, matches in BLAST database searches.

Reference:

Kim C. Worley, Brent A. Wiese, and Randall F. Smith (1996). Post-Processing BLAST Search Results using BEAUTY. In preparation.

Kim C. Worley, Brent A. Wiese, and Randall F. Smith (1995). BEAUTY: An enhanced BLAST-based search tool that integrates multiple biological information resources into sequence similarity search results. Genome Research 5:173-184.


Back to BCM Search Launcher: Protein Sequence/Pattern Searches

Credits:
NCBI BLAST Notebook: Herve Recipon (recipon@ncbi.nlm.nih.gov) and Warren Gish
BEAUTY Post-processing:Kim C. Worley, Pamela A. Culpepper (Email)

Last modified: Tue Dec 3 12:13:22 CST 2002
.
BCM HGSC