BCM Search Launcher - Frequently Asked Questions

Question or Comments?

Questions answered:


Why have regions of my sequences queries been replaced with runs of X's or N's?

For several of the sequence database search programs used here (including BLAST, BEAUTY, and FASTPAT), each query sequence is filtered for "low-entropy/low-complexity" regions prior to searching. The search options or parameters used for each program can viewed using the [P] link on the search page. This page lists the type of filter that is used for each search.

For protein sequence searches, filtering is performed using the XNU and SEG filters. These filters convert low-complexity regions (such as runs of Q's) to X's -- see the description of the '-filter' option in the BLAST Help Pages. Low-complexity filtering is also described in Altschul et al., (1994) Nature Genetics 6:119-129.

For DNA query sequences, filtering is performed using the RepeatMasker program. The types of repeats are reported, and the repeats and low-complexity regions are converted to N's.

Low-complexity filtering is also described in Altschul et al., (1994) Nature Genetics 6:119-129.

To turn the filtering option off, submit your search using the full-options search form (the [O] link on the search page).

We highly recommend using the filtering option when performing sequence searches. If a sequence does contain a low-entropy region and filtering is not used, then one's search results can be overwhelmed by large numbers of supurious matches to low-complexity regions.


My sequence is shorter that 7kb, why can't I do a search?

Either the browser that you use or the format of your sequence file may be at fault. Internet Explorer browsers have a much lower effective size limit than Netscape browsers. Some users have found that sequences over 2kb do not work with Internet Explorer.

Since the 7kb sequence size limit is really measured in bytes rather than basepairs, characters like spaces and carriage returns count toward the size limit. For this reason, it is best to use a compact sequence format.


Search Launcher returned a 500 server error, what should I do?

This generally indicates a difficulty with the Search Launcher and we appreciate any information you are able to provide to help resolve the difficulty. To help resolve the difficulty the following information is vital: the time and your host.

Please be as precise as possible with the time. The minute or even second is ideal and be sure to specify the timezone. Your host is the hostname or IP address of the system you are using.

Additional information that can be helpful is: the search performed, any (none) default parameters used, and any data submitted.

We would like to thank you for taking the time to report a difficulty.


The server returned a "sequence too long error" even though my sequence is short, what is wrong?

This has been a frequent problem with the Internet Explorer browser. If you are using this browser, we recommend that you try the Netscape Navigator.

The limit for many searches is 7000 bytes. Sequences which include numbers and spaces (like a sequence pasted from GCG) approach the 7000 byte limit when the sequence is much smaller than 7kb (the posted size limit). Remove the extra spaces, carriage returns and numbers from the sequence before you submit the search to avoid this error with sequences of 5-7kb.


The Mac Search Launcher Batch Client (MSLBC) returns a "Document Not Dated" message, what is wrong?

Our recent upgrade to the Apache 1.3 server software caused a problem with the Batch Client for the Macintosh. Please download the mac-search-launcher-2.6.pl Perl script. The Mac perl script version does not download properly using Netscape to click on the ftp site link. Instead, ftp using Fetch or another (more generic) ftp program to star.bcm.tmc.edu in the pub/software/search-launcher directory with the username anonymous and the password your email address.


The Mac Search Launcher Batch Client (MSLBC) does not accept my sequences, what is wrong?

You are experiencing this difficulty if you keep getting the help message even when you drop sequences on the MSLBC. First, make sure you are running AppleScript. AppleScript is part of System 7.5 but an extra for older systems. The MSLBC is a MacPerl script and MacPerl 4.1.8 requires AppleScript.

If you are running AppleScript and still having this difficulty, your next best option is to install MacPerl and create a droplet. MacPerl is available here. Once you have installed MacPerl, grab a copy of the Search Launcher Batch Client script here. Start up MacPerl, open the script, then save it as a droplet under a name like SearchLauncher. Drop your sequences on this newly created droplet and you should be asked for the sequence type. If you still get the help message please report the difficulty by sending email.

If you were having difficulty with the original MSLBC but the droplet worked, you will need to leave MacPerl installed on your Mac. If you do not intend to use MacPerl, you might try creating your own runtime instead of droplet by choosing 'runtime' instead of 'droplet' when you save the script from MacPerl. If this works then there is no need to keep MacPerl installed. Please email us if a runtime you create works, it has not worked yet and we would be interested in hearing from you.


The Mac Search Launcher Batch Client (MSLBC) returned a Bad Address error, what should I do?

There is a limit to the size of acceptable file names on the Mac and PowerPC, use a shorter sequence name so that the "sequence-name.search-tool-name.database-name.html" name is not over about 30 characters. Limiting your sequence name to 10 characters will allow most searches.


Can the Search Launcher Batch Client (SLBC) be used behind a firewall?

Yes, the Search Launcher Batch Client (SLBC) can be used behind a firewall or with a proxy. At this time, for both the Mac and unix systems the hostname and port of the firewall/proxy must be set in the SLBC perl script by editing the script itself. Verion 2.0 or greater of the SLBC perl script is required, it can be obtained here.

To make this script work, be sure you have the hostname and port values of the firewall/proxy. If you are unsure of the port value use the default of 80. In the script look for the word 'proxy' (around line 75). Put your hostname between the two single quote characters towards the end of this proxy_host line, similarly for the proxy_port line. What you need to change looks like ...

$proxy_host = '';
$proxy_port = '';

... and for a hostname (an IP address can also be used) of 'myhost' and a port of '80' should be changed to ...
$proxy_host = 'myhost';
$proxy_port = '80';

On unix systems just edit the script using your favorite editor and put it wherever you want to run it from, see the README for more information.

On a Mac you can use SLBC or MacPerl to make the change. If you want to use SLBC then install it first, see the README for information on obtaining and installing SLBC. Then start SLBC up without dropping any files on it. It will display a window that says return to quit, but do not hit return but rather go to the file menu of MacPerl (SLBC is MacPerl with the Search Launcher perl script loaded) and then open the new script, make the necessary changes (unless it has already been changed using another method) and then do a 'save as' and save the modified script under some name, as a runtime. You can also use MacPerl if you already have it installed or want to obtain it here. Just startup MacPerl, open the script, make any necessary changes, then save it as a droplet.


Search Launcher Batch Client (SLBC) BEAUTY Post Processing - What BLAST program and settings should I use to get the best results?

BEAUTY post-processing using the Search Launcher Batch Client version 2.0. provides the summary figure and hypertext links to additional information for most BLASTP and BLASTX search results, including text formatted results returned by the NCBI's BLAST server or produced by the NCBI's BLAST Client.

  • For the best results with the NCBI BLAST Client and any protein database: 1) Use either BLASTP or BLAST X from the NCBI's BLAST Client (instructions for installing this are available here). 2) Use any protein or DNA query sequence, name the sequence file name.fa or name.seq. 3) Search any BLAST searchable protein database - including local, private, or public databases. 4) Use the -gi option. 5) Use the -filter seg+xnu -echofilter options. 6) Set B=V eg. -B=50 -V=50 option. 7) Write the results to a file in the same directory as the sequence file.

    So a command to run the BLAST client would look like this: blastp mydatabase myseq.fa -gi -filter seg+xnu -echofilter -B=50 -V=50 > myseq.blastp

  • For searches from the NCBI BLAST server: 1) Use either BLASTP or BLAST X. 2) Use any protein or DNA query sequence, name the sequence file name.fa or name.seq. 3) Search any BLAST searchable protein database. 4) Select the NCBI-gi option. 5) Use the -filter seg+xnu options. 6) Set number of Descriptions and Alignments to the same number (say 50 and 50), do not use the default. 7) Save the results to a file in in the same directory as the sequence file in text format.

    For all BLAST results files (eg. myseq.blastp): 8) Start the Search Launcher Batch Client, v. 2.0. 9) Choose output as the file type. 10) Choose BEAUTY post processing. 11) Enter the file(s) (eg. myseq.blastp or *.blastx). The coordinating sequence file(s) (eg. myseq.fa) will be found by the SLBC as long as the prefix of the name matches the output file name, the suffix is either .fa or .seq, and the sequence file and the blast output file are in the same directory (or folder).

    Some of these are firm requirements, and others are suggestions to improve the performance, the reasons for these recommendations are discussed below. For 1-3) Since BEAUTY post-processing is only available for protein database searches, use BLASTP for protein queries, BLASTX for DNA queries, and search a protein database. For more information about making a BLAST searchable database, see. 4) Although BEAUTY can retrieve information to generate hypertext links using accession numbers (by retrieving the associated gi number), it is faster and more reliable if the NCBI's gi numbers are included in the output. 5) To avoid spurius matches to protein repeats or low complexity regions use a sequence filter, and, to see which parts of the sequence have been filtered, echo the filtered sequence. 6) These numbers need to match for reliable post-processing, although any setting may be used (eg. -B=500 -V=500 or -B=10 -V=10). 11) The Batch Client expects that the name of the BLAST file and the name of the corresponding sequence file have the same prefix, followed by a period, followed by a suffix. The suffix for the sequence file must be either "fa" or "seq" (eg. myfile.fa or myfile.seq). Also, extra punctuation causes problems (eg. don't use my.file.fa or my_file.fa or my-file.fa).


    Back to BCM Search Launcher Home Page

    Page Curator: Kim C. Worley, Human Genome Center, Baylor College of Medicine (Questions or Comments).
    Last modified: Tue Apr 30 09:11:35 CDT 2002


  • .
    BCM HGSC