PFMAKE help
A brief introduction to PFMAKE
The program PFMAKE, belonging to PFTOOLS package, generates a PROSITE profile (generalized profile) from a multiple sequence alignment.
The generalized profile syntax combines the functions of a variety of motif descriptors implemented in other methods, including regular
expression-like patterns, weight matrices, "classical" profiles and certain types of hidden Markov models (HMM). A generalized profile
can be used in homology detection by comparing a query profile against a DNA or protein sequence library.
For further details about generalized profiles and PFTOOLS package,
see the ISREC Profile Homepage.
Availability in NPS@
PFMAKE is available :
- At URL : https://npsa-prabi.ibcp.fr/NPSA/npsa_pfmake.html.
The input multiple protein alignment must be in clustalw format.
For pfmake, the standard character set for proteins is {A, B, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y, Z}.
- After a multiple alignment with CLUSTALW.
Parameters
First, you have to choose the weighting parameters : the number of shuffles per sequence to be performed,
the total weight and the gap excision threshold (minimal fraction of non-gap characters a column of the multiple
sequence alignment must contain in order to be considered for weighting). The weighting algorithm implemented corresponds to the method of Sibbald
and Argos (1990).
Then you find the profile construction options: the alignment boundaries, the score matrix, the
alignment mode, the block profile mode (this mode produces profiles that favor alignments with insertions and deletions positioned
symmetrically around a few positions), the endgap and gap weighting modes and the circular profile
mode.
Next you can set the profile construction parameters:
- Multipliers : the score matrix multiplier and the output score multiplier.
- Gap penalties : the gap opening penalty, the gap extension penalty, the maximum gap penalty multiplier and the
gap penalty multiplier increment. For further details about these parameters, see Gribskov et al. (1990).
- Gap region threshold : this is the minimal fraction of gap characters a column of the multiple alignment must contain in order
to be considered part of a gap region.
- Gap excision threshold : this is the minimal fraction of non-gap characters a column of the multiple alignment must contain in
order to be converted into a match position.
- High-cost initiation/terminaison score : this score is applied to all external and internal initiation and termination scores
corresponding to path matrix positions where initiation or termination at low cost is not possible according to the alignment mode specified.
To allow initiation/termination, just increase the corresponding parameter ; for example, you can take H=0.6. By default H=* (low value).
- Low-cost initiation/termination score : this score is applied to all external and internal initiation and termination scores
corresponding to path matrix positions where initiation or termination at low cost is possible according to the alignment mode specified.
By default L=0.
The scaling parameters (size of random database, upper and lower thresholds) don't have to be changed.
NPS@ PFMAKE output
The PFMAKE output allows you to follow your analysis with a profile search against a protein library thanks to
PFSEARCH.
References