WU-BLASTX+BEAUTY / nr protein - Warren Gish's BLAST with gapped alignments, RepeatMasker, and BEAUTY post-processing (WU/BCM) [H] [O] [P] [E]
Make sure the selection circle/diamond is clicked.
Note: Sequence formats
Your sequence can be converted to an appropriate format using Readseq. The search engines used by the BCM Search Launcher accept FASTA format. The Search Launcher includes a utility to convert a Readseq acceptable sequence into FASTA format.
From the BCM search launcher page choose Sequence utilities. Input your sequence, and select the first search option (Readseq).
Notes on this type of search:
WU-BLASTX (Washington University BLASTX)is a search program that compares a DNA sequence, translated in all possible reading frames, to a selected protein database. Thus, it compares six possible translation products of a nucleotide query sequence (three translations, both strands). Translating the DNA sequence prior to searching greatly improves the accuracy of the search, as true sequence similarity is much easier to detect in proteins that in DNA. In DNA, the rapidly mutating third base position of each codon and the large non-coding regions limit the effectiveness of direct DNA sequence comparisons in detecting non-identical matches.
Other programs that search translated DNA queries against protein databases include BLASTX 2.0 from NCBI and FASTX (University of Virginina).
BLASTuses a heuristic search algorithm, starting with a quick, but less accurate, exact-match procedure to identify possible matches from the entire database, and then using the more sensitive but time-consuming Smith-Waterman algorithm to search these matches and create the final output. Details of BLAST search strategy
"with gapped alignments" means that this search engine recognizes gaps in sequence alignments and incorporates gap costs cooresponding to these areas in the final sequence alignment score. Generally, the gap cost charges a large initial penality for the existence of a gap, and smaller penalities for each individual residue. This takes into account that each mutational event can insert or delete multiple residues at a time, and thus the bulk of the gap cost penalty is for the existence of the mutation itself, not the length. This also allows matches to extend across short gaps and still appear as contiguous in the sequence alignment.
RepeatMasker screens the input sequence and masks low complexity regions and interspersed repeats. Matches to unmasked repeat sequences is the most common reason for incorrectly assigned protein function. Using a tool such as this before a database search will avoid the many false positive matches to non-specific sequences. As a result, the matches to unique sequences will predominate in the search output. RepeatMasker Documentation
BEAUTY (BLAST Enhanced Alignment Utility) Post- Processor incorporates additional information directly into the BLASTX search output, including information on the locations of any annotated domains among the sequence matches produced by the BLASTX search. Regions of sequence matches (local hits) that overlap annotated domains may have the same function as the domain. Thus, this information improves the identification of weak, but functional significant sequence matches. Example of BEAUTY output