![]() |
README for WU-BLAST2 README for WU-BLAST2
This is a development version of WU-BLAST (Washington University BLAST) 2.0 software for rapid and sensitive similarity searches of protein and nucleotide sequence databases. WU-BLAST 2.0 alpha software is copyrighted and may not be sold or redistributed without the express written consent of the author; but the software may otherwise be freely used for commercial, nonprofit, and academic purposes. The principal new features in WU-BLAST version 2 are: o Gapped alignments are produced, with potentially multiple regions of similarity being found between each pair of sequences. In WU-BLAST 2.0, the gapped alignment routines are integral to the database search itself, not a post-processing step grafted onto an old BLAST version 1.4 search, and thus yield better sensitivity. Each of the version 2.0 programs with gaps executes about 10% slower than its version 1.4 counterpart, but generally yields more easily interpretable output and much better sensitivity than version 1.4. o Karlin and Altschul (1993) "Sum statistics" are used to evaluate the significance of multiple regions of similarity found between the query and a database sequence, as described by Altschul and Gish (1996). New command line options include the following. Terse program usage information can also be obtained by entering one of the program names on the command line without arguments. Q=# the penalty for a gap of length 1 (default Q=9) R=# the per-residue penalty for extending a gap (default R=2) nogap do not create gapped alignments, in essence reverting to WU-BLAST 1.4 behavior. gapall generate a gapped alignment for every HSP found gape=<e> generate gapped alignments for all HSPs between sequences whose expected frequency of chance occurrence is less than or equal to. Default value is gape=2000. gapw=<w> set the window width within which gapped alignments are generated (default is gapw=32 for protein comparisons, gapw=16 for BLASTN). gapK=<k> the value of the Karlin-Altschul statistics' K parameter to use when evaluating the significance of gapped alignment scores. (Useful when precomputed values are unavailable for the chosen scoring matrix and gap penalty combination in the programs' internal tables). gapL=<l> the value of the Karlin-Altschul statistics' lambda parameter to use when evaluating the significance of gapped alignment scores gapH=<h> the value of the Karlin-Altschul statistics' H parameter to use when evaluating the significance of gapped alignment scores noseqs produces greatly abbreviated output that omits sequence alignments and yet may be interpreted correctly by existing parsers. compat1.4 produces BLAST version 1.4-style output (no gaps) but with bug fixes and performance enhancements in place. hspsepqmax max. permitted distance along the query sequence separating two consistent HSPs hspsepsmax max. permitted distance along the subject (database) sequence between two consistent HSPs gapsepqmax max. permitted distance on the query sequence between two consistent gapped alignments gapsepsmax max. permitted distance on the subject sequence between two consistent gapped alignments mmio turns off the use of memory-mapped I/O in the reading of database files. Use of this option will typically retard the search -- particularly when multiple processors are being used -- but serves both to demonstrate the effectiveness of this form of I/O and to validate the I/O routines. o In WU-BLAST 2.0, the BLASTDB environment variable is a colon-delimited list of directory names. In UNIX parlance, it is a path. The default BLASTDB value is ".:/usr/ncbi/blast/db", such that the programs first look in the current working directory (".") for the requested database, then they look in the "/usr/ncbi/blast/db" directory. For backwards compatibility with programs that expect BLASTDB to be a single directory specification, not a path, if the user has set a value for BLASTDB but omitted the current working directory, the version 2 programs look in the current working directory as a last resort. BUGS o Parameters lambda, K and H for gapped alignments are obtained by looking up their values in precomputed tables, not by finding solutions to analytical equations as is done for ungapped alignments. Thus, values are not available for all scoring matrix and gap penalty combinations. When appropriate values are unavailable in the precomputed tables, the programs issue a WARNING and proceed to execute the database search using incorrect values; in such cases, the statistical significance estimates reported will usually be highly inaccurate. If the user happens to know more appropriate values, then the gapK, gapL and gapH parameters should be used to set them. o When the user selects an alternative scoring matrix, the gap penalties Q and R remain unchanged from their default values (unless otherwise specified). This can inadvertantly yield a situation in which the programs do not have appropriate values of lambda, K and H in their precomputed tables. As described above, a WARNING message will indicate such situations. o the "hspsepqmax", "gapsepqmax", etc. parameters are measures of distance in residues along the sequences in the specific form in which they are compared. For instance, in a BLASTX search (conceptually translated nt. query sequence compared against a protein sequence database), hspsepqmax refers to a distance measured in amino acid residues, not the underlying nucleotides in the query. o ASN.1 formatted output is currently broken. References Altschul, SF, and W Gish (1996). Local alignment statistics. ed. R. Doolittle. Methods in Enzymology 266:460-480. Karlin, S, and SF Altschul (1993). Applications and statistics for multiple high-scoring segments in molecular sequences. Proc. Natl. Acad. Sci. 90:5873-7.
|
||||