GeneMatcher server Smith-Waterman vs BLAST

When to use the GeneMatcher (Smith-Waterman)?

The Smith-Waterman (S-W) search algorithm used by the FDF server is about 5% more sensitive towards divergent matches than the BLAST algorithm. Therefore, you will significantly increase your chances of finding distant homologs of your query sequence in the databases.
Furthermore, the FDF software incorporates a frameshift-tolerant search algorithm. This feature is particularly useful when searching for potential coding sequences in low-quality DNA sequences, such as those found in EST databases. The sequences found in those databases are often full of sequencing errors, insertions, and deletions, leading to shifts in the reading frame that can eliminate interesting targets from a BLAST search.
 
Thus you will give preference to the FDF server:
  • if you are looking for a protein distantly related to your query sequence -> SWP
    e.g., you have a known protein sequence and you want to find possible distant homologues
  • if you are looking for the protein encoded in your low-quality DNA query sequence -> SWX
    e.g., you have a badly sequenced cDNA clone
  • if you are looking for a DNA sequence corresponding to your protein query sequence -> TSWN
    e.g., you want to identify potential homologues of your protein in the EST databases
 

When to use BLAST?

The BLAST search algorithm is designed to find close matches rapidly. It is faster than the S-W algorithm.
Thus you will give preference to the BLAST server:
  • if you are looking for close matches and you don't mind missing lower homology sequences
  • if you want a quick answer

For more information see:

Genomics 38, 179-191, 1996.
Article no. 0614.

Sensitivity and Selectivity in Protein Similarity Searches:
Comparison of Smith-Waterman in Hardware to BLAST and FastA

Eugene G. Shpaer,* Max Robinson, David Yee, James D. Candlin,* Robert Mines,* and Tim Hunkapiller

*Perkin-Elmer, Applied Biosystems Division, Foster City, California  
University of Washington, Seattle, Washington

To predict the functions of a possible protein product of any new or uncharacterized DNA sequence, it is important first to detect all significant similarities between the encoded amino acid sequence and any accumulated protein sequence data. We have implemented a set of queries and database sequences and proceeded to test and compare various similarity search methods and their parameterizations. We demonstrate here that the Smith-Waterman (S-W) dynamic programming method and the optimized version of FASTA are significantly better able to distinguish true similarities from statistical noise than is the popular database search tool BLAST. Also, a simple "log-length normalization" of S-W scores based on the query and target sequence lengths greatly increased the selectivity of the S-W searches, exceeding the default normalization method of FASTA. An implementation of the modified S-W algorithm in hardware (the Fast Data Finder) is able to match the accuracy of software versions while greatly speeding up its execution. We present here the selectivity and sensitivity data from these tests as well as results for various scoring matrices. We present data that will help users to choose threshold score values for evaluation of database search results. We also illustrate the impact of using simple-sequence masking tools such as SEG or XNU.