Finding a protein using a DNA sequence of known translation frame:


  1. Translate DNA sequence

    Note: Translating the sequence

    Protein to protein sequence comparisons are more sensitive than DNA to protein sequence comparisons. After using the 6 Frame Translation program, "cut" out the correct translation and "paste" it in the appropriate search window (see below).

  2. From the BCM search launcher page choose General protein sequence/pattern searches.

  3. Select the first option of search choices:

    WU-BLASTP+BEAUTY / nr protein - Warren Gish's BLAST with gapped alignments with BEAUTY post-processing (WU/BCM). [H] [O] [P] [E]

    Make sure the selection circle/diamond is clicked.

  4. Enter your amino acid sequence in the input box.

  5. Click "Perform search".

Notes on this type of search:

  • WU-BLASTX (Washington University BLASTX)is a search program that compares a DNA sequence, translated in all possible reading frames, to a selected protein database. Thus, it compares six possible translation products of a nucleotide query sequence (three translations, both strands). Translating the DNA sequence prior to searching greatly improves the accuracy of the search, as true sequence similarity is much easier to detect in proteins that in DNA. In DNA, the rapidly mutating third base position of each codon and the large non-coding regions limit the effectiveness of direct DNA sequence comparisons in detecting non-identical matches.

    Other programs that search translated DNA queries against protein databases include BLASTX 2.0 from NCBI and FASTX (University of Virginina).

  • BLAST uses a heuristic search algorithm, starting with a quick, but less accurate, exact-match procedure to identify possible matches from the entire database, and then using the more sensitive but time-consuming Smith-Waterman algorithm to search these matches and create the final output. Details of BLAST search strategy

  • "with gapped alignments" means that this search engine recognizes gaps in sequence alignments and incorporates gap costs cooresponding to these areas in the final sequence alignment score. Generally, the gap cost charges a large initial penality for the existence of a gap, and smaller penalities for each individual residue. This takes into account that each mutational event can insert or delete multiple residues at a time, and thus the bulk of the gap cost penalty is for the existence of the mutation itself, not the length. This also allows matches to extend across short gaps and still appear as contiguous in the sequence alignment.

  • BEAUTY (BLAST Enhanced Alignment Utility) Post- Processor incorporates additional information directly into the BLASTX search output, including information on the locations of any annotated domains among the sequence matches produced by the BLASTX search. Regions of sequence matches (local hits) that overlap annotated domains may have the same function as the domain. Thus, this information improves the identification of weak, but functional significant sequence matches. Example of BEAUTY output

  • filters:This search uses XNU+SEG filters for all searches (click on [P] in the search line for this information.) These filters convert low-complexity protein regions (such as runs of Q's) to X's -- see the description of the '-filter' option in the BLAST Help Pages.


.
BCM HGSC