![]() |
Beauty information is likewise added to NCBI's Gapped BLASTP and W.R.Pearson's FASTA searches of NCBI's NR Protein Database provided by the Human Genome Center, Baylor College of Medicine.
BEAUTY incorporates figures summarizing the information on the locations of local hits and any annotated domains and sites directly into BLAST search results. These enhancements make it much easier to detect weak, but functionally significant, matches in BLAST database searches. In addition, the time needed for a scientist to fully evaluate the BEAUTY search results is much less than the time needed to evaluate a comparable BLAST search result.
The BEAUTY Post-Processor can be used with any of the protein databases searches provided by the NCBI BLAST server. The NCBI's Non-redundant (nr) protein database is the default database. Other standard protein sequence databases from the NCBI are available for searches using the BLASTP + BEAUTY and BLASTP/BLASTX + BEAUTY option pages for the BEAUTY Post-Processor.
A database of annotated domains/sites was created for use with the BEAUTY Post-Processor by 1) scanning the Entrez database for those protein sequences with annotations describing known domains and sites within the sequence, 2) matching each Entrez sequence against the sequence motifs in the PROSITE pattern database and storing the location of each hit, 3) extracting the locations of the conserved blocks within the sequences represented in the BLOCKS database, 4) extracting the locations of the domains identified in the sequences in the PRINTS protein fingerprint database, and 5) extracting the locations of the domains identified in the sequences in PFAM, Protein families database of alignments and HMMs.
BEAUTY incorporates information on the locations of any annotated domains and sites directly into BLAST search results:
1) A figure is added showing the relative location of each hit (HSP) within the query sequence with the accession number linked to the individual reports listed below. In addition, the query sequence is matched against the PROSITE pattern database, and location of all pattern matches within the query sequence is displayed:
Locally-aligned regions (HSPs) with respect to query sequence:
Locus_ID
gi|44804|lcl|2 | __________ _______
sp|P13186|KIN2 | ____ ____ _______
sp|P27704|ERK3 | ___________
gi|4229|lcl|13 | _______ ________
gi|393281|lcl| | ________
sp|P32361|IRE1 | ________
gi|450233|lcl| | ____ _______
pir||B40466|gi | ________
sp|P08414|KCC4 | _______
gi|306479|lcl| | _______
sp|P13185|KIN1 | ____ ____ _______
Prosite Hits: __
__________________________________________________
Query sequence: | | | | | | 224
0 50 100 150 200
__________________
Prosite hits:
PROTEIN_KINASE_TYR Tyrosine protein kinases specific active 138..150
__________________
2) A figure is added for each BLAST hit showing:
a) the positions of the local hits (HSPs) and
b) the location of any annotated domains and sites within
each matched sequence, e.g.,:
Local hits (HSPs): ____________ ____ ______
Annotated Domains: ______ _______
__________________________________________________
Database sequence: | | | | | | | 271
0 50 100 150 200 250
__________________
Annotated Domains:
Entrez np-binding site: ATP. 40..47
BLOCKS ABC_TRANSPORTER: ABC transporters family 23..53
BLOCKS ABC_TRANSPORTER: ABC transporters family 144..175
PROSITE ATP_GTP_A: ATP/GTP-binding site motif A 40..47
PROSITE ABC_TRANSPORTER: ABC transporters family 144..158
__________________
In summary, by incorporating annotated domain and site information directly into BLAST search results, BEAUTY can greatly improve the identification of weak, but functionally significant, matches in BLAST database searches.
Reference:
Kim C. Worley, Brent A. Wiese, and Randall F. Smith (1996). Post-Processing BLAST Search Results using BEAUTY. In preparation.
Kim C. Worley, Brent A. Wiese, and Randall F. Smith (1995).
BEAUTY: An enhanced BLAST-based search tool that integrates
multiple biological information resources into sequence similarity search
results. Genome Research
5:173-184.
Last modified: Tue Dec 3 12:13:22 CST 2002
|
||||