Home | Contact 

PRSS3 - evaluates the significance of a protein sequence alignment


prss3 is used to evaluate the significance of a protein or DNA sequence similarity score by comparing two sequences and calculating optimal similarity scores, and then repeatedly shuffling the second sequence, and calculating optimal similarity scores using the Smith-Waterman algorithm. An extreme value distribution is then fit to the shuffled-sequence scores. The characteristic parameters of the extreme value distribution are then used to estimate the probability that each of the unshuffled sequence scores would be obtained by chance in one sequence, or in a number of sequences equal to the number of shuffles.

This program is derived from rdf2, described by Pearson and Lipman, PNAS (1988) 85:2444-2448, and Pearson (Meth. Enz. 183:63-98). Use of the extreme value distribution for estimating the probabilities of similarity scores was described by Altshul and Karlin, PNAS (1990) 87:2264-2268. The 'z-values' calculated by rdf2 are not as informative as the P-values and expectations calculated by prdf. prss3 uses calculates optimal scores using the same rigorous Smith-Waterman algorithm (Smith and Waterman, J. Mol. Biol. (1983) 147:195-197) used by the ssearch3 program.

prss3 also allows a more sophisticated shuffling method: residues can be shuffled within a local window, so that the order of residues 1-10, 11-20, etc, is destroyed but a residue in the first 10 is never swapped with a residue outside the first ten, and so on for each local window.

This program is part of the FASTA package of sequence analysis program.

  • Usage: Paste your two sequences in one of the supported formats into the sequence fields below
    and press the "Run PRSS" button.
    Make sure that both format buttons (next to the sequence fields) shows the correct formats

Number of shuffles :

window size:

Scoring matrix :

gap opening penalty:

gap extension penalty:

First sequence title (optional):

Input sequence format

1st Query sequence:
or ID or AC or GI
(see above for valid formats)

Second sequence title (optional):

Input sequence format

2nd Query sequence:
or ID or AC or GI
(see above for valid formats)